B. The Most Successful Microprocesor and Their Interface Circuits
B. The Most Successful Microprocesor and Their Interface Circuits
Muhammad El-Saba,
mhs1308&gmail.com
Includes
_________________________________________
Covers
Intel 8086/8088/80286/80386/80486/
Pentium, Pentium II, Pentium III, Pentium4, Xeon, Itanium,
Intel Atom, Core, Core2, Core i7, Corei9 Processors,
Latest AMD, ARM and SPARC Processors
2002-2020
Introduction to Microprocessors & Interface Circuits INDEX
ii
CONTENTS
Subject Page
Preface
Preamble
iii
Subject Page
2-10. Architecture of Intel’s Pentium 4 Microprocessor 71
2-11. Intel’s Core and Core2 Microprocessors 76
2-12. Intel’s Core i5, Core i7 and Corei9 Microprocessors 77
2-13. Architectures 64-bit Microprocessor 77
2-14. Architecture of AMD K10 78
2-15. Summary of Intel & AMD Architectures 79
2-16. Evolution of 80x86 from CISC to RISC Architecture 86
2-17. Architecture of RISC Processors (ARM & SPARC) 88
2-17.1. Architecture of ARM Processors 88
2-17.2. Architecture of SPARC Processors 92
2-17.3. Architecture of Super-SPARC Processors 93
2-17.4. Architecture of SPARC64 Processors 94
2-17.5. Architecture of UltraSPARC64 Processors 98
2-17.6. Multithreading Technology 100
2-18. CPU Market Share 101
2-19. Moore’s Law 104
2-20. Summary 105
2-21. Problems 112
2-22. References 114
CH3: Memory Organization and Segmentation 115
3-1. Memory Segmentation in Computer Systems 117
3-1.1. Virtual Memory 117
3-1.2. Physical Memory 117
3-1.3. Memory Paging 117
3-2. Memory Segmentation in x86 Systems 118
3-2.1. Flat Memory Model 118
3-2.2. Segmented Memory Model 118
3-3. Operation Modes of x86 Processors 119
3.3.1. Real Mode 120
3.3.2. Protected Mode 120
3.3.3. Virtual Mode 121
3.3.4. Long & Legacy Modes of x86-64 Processors 122
3-4. Memory Addressing of x86 Processors 123
3.4.1. Real Mode Addressing (Generating 20-bit Address) 124
3-4.2. Protected Mode Addressing (Generating 32-bit Address) 127
3-4.3. Memory Paging in Protected Mode 131
3-4.4. Protection Aspects 132
3-4.5. Privilege Levels 133
iv
Subject Page
3-4.6. Entering and Leaving the Protected Mode 134
3-4.7. Protected Multitasking 135
3-4.8. Virtual Mode 135
3-4.9. Physical Address Extension (PAE) 135
3-4.10. Long Mode Addressing in x86-64 Architecture 135
3-4.11. Long Mode Memory Management 137
3-4.12. RIP-Relatives Addressing 138
3-5. Stack Operation 139
3.5.1. Setting-up a Stack 141
3-5.2. Stack Operations 142
A.. Push Operation 142
B.. Pop Operation 144
3-5.3. Illustration Examples 145
3-5.4. Stack and Calling Procedures 146
3-5.5. Stack Behavior in 64-Bit Mode 147
3-6. IBM Memory Organization 148
3-7. SPARC Memory Models and Addressing Space 150
3-7.1. SPARC Memory Modes 150
3-7.2. SPARC Addressing Space 152
3-8. ARM Memory Organization 153
3-8.1. ARM Registers 153
3-8.2. ARM Stack 155
3-9. Summary 156
3-10. Problems 160
3-11. References 162
CH4: Microprocessor Instructions 163
4-1. Introduction 165
4-2. Data Types (Bytes, Words, Integers, Floating point numbers, … ) 165
4-3. Instruction Format of x86 Processors 172
4-4. Addressing Modes of x86 Processors 173
4-5. Intel’ 8086/80186/80286/80386/80486 Instruction Set (Alphabetical) 175
4-6. Basic Instruction Set of x86 Processors (by category). 180
4-6.1. Data Transfer Instructions 181
4-6.2. Arithmetic Instructions 187
4-6.3. Logic Instructions 190
4-6.4. String Instructions 193
4-6.5. Program Control Instructions 196
4-6.6. Processor Control Instructions 197
4-7. Math Coprocessor (x87) Instructions 199
Subject Page
4-8. Subroutine Calls & Interrupts in Assembly Language 201
4-8.1. Subroutine Calls (CALL) 202
4-8.2. Interrupts (INT) and interrupt vector table (IVT) 202
4-8.3. Masking Interrupts (Turning Interrupts Off) 206
4-8.4. Interrupts Priority 207
4-9. IBM PC Interrupts and & DOS Calls 208
4-9.1. PC Boot Process 208
4-9.2. PC Interrupt Service Routines (ISR’s) 209
4-9.3. BIOS Calls & DOS Calls 213
4-10. Interrupts in Protected Mode 215
4-10.1. Gates 215
4-10.2. Interrupt Descriptor Table (IDT) 216
4-10.3. Interrupt Masking in Protected Mode 216
4-10.4. Debugging in Protected Mode 217
4-11. New Instruction Sets of x86-64 Architecture 218
4-11.1. Media Instructions 219
4-11.2. Floating-Point Instructions 219
4-12. Summary of the Recent x86 Instructions 221
4-12.1. MMX Instructions. 221
4-12.2. Streaming SMID (SSE) Instructions. 221
4-12.3. SSE2 Instructions 222
4-12.4. SSE3 Instructions. 223
4-12.5. SSE4 Instructions. 223
4-13. Undocumented x86 Instructions 224
4-14. Converting Assembly Language to Machine Code 224
4-15. Case Study: Encoding the MOV Instruction 229
4-16 Execution Time of x86 Instructions 231
4-17. Instructions Set of SPARC Processors 233
4-18. Instruction Format of SPARC Processors 236
4-19. Encoding Load / Store Instructions of SPARC Processors 240
4-20. Instructions Set of ARM Processors 243
4-21. Summary 250
4-22. Problems 253
4-23. References 257
CH5: Assembly Language Programming, Compilation & Debugging 259
5-1. Introduction 261
5-2. DEBUG Program 262
5-3. Macro Assembler Programs 266
vi
Subject Page
5-4. Assembly Language Instructions Format. 267
5-5. Assembler Data Types. 269
5-6. Assembler Directives 269
5-7. Declaring Variables 272
5-8. Modifiers & Attribute Operators 273
5-9. Difference between Values, Addresses and Pointers 275
5-10. Arrays in Assembly Language 276
5-11. Tables & Lookup Tables in Assembly Language 278
5-12. Other Data Structures in Assembly Language 278
5-12.1. Queues 279
5-12.2. Linked Lists 279
5-12.3. Hash Tables 282
5-12.4. Binary Trees 282
5-13. Working with Strings in Assembly Language 284
5-14. Procedures in Assembly Programs 286
5-15. Functions in Assembly Programs 286
5-16. Writing & Initializing Interrupts in Assembly Programs 287
5-17. Creating Macros in Assembly Programs 289
5-18. Assembly Program Compilation & Linking 291
5-19. 16-Bit Macro-Assemblers (MASM16, TASM16) 292
5-20. MASM Syntax for x86 Memory Addressing Modes 293
5-21. 32-Bit Macro-Assemblers (MASM32) 298
5-22. 64-Bit Macro-Assemblers (YASM) 300
5-23. Summary of x86 Macro-assembler Programs 301
5-24. Summary 302
5-25. Problems 303
5-26. References 306
CH6: Writing Assembly Routines within C/C++ and Java Programs 307
6-1. Introduction 309
6-2. General Considerations (16-bit, 32-bit and 64-bit programs) 310
6-2.1. Using YASM Assembler with Visual Studio and VC++ 310
6-2.2. I/O Software Layers 311
6-2.3. I/O in DOS, and Windows 311
6-2.4. Direct Memory Access (ActiveX and all that Stuff) 312
6-3. C-Programming Language (Summary) 314
6-3.1. Data Types in C-language 315
6-3.2. Variable Declaration in C-language 316
6-3.3. Expressions 316
vii
Subject Page
6-3.4. Operators 316
6-3.5. Conditional Execution & Branching 318
6-3.6. Looping Instructions 319
6-3.7. Functions Declaration (Prototyping) and Definition 319
6-3.8. Derived Types 321
6-3.9. Data Structures 321
6-3.10. Accessing Structure Members 322
6-3.11. Pointers in C-language 322
6-3.12. Utilization of Pointers with Structures 322
6-3.13. Input / Output in C-Language 324
6-3.14. C-Preprocessor Directives 329
6-4. C++ and Object-Oriented Programming 331
6-4.1. Object-Oriented Programming (OOP) 331
6-4.2. Classes in C++ 333
6-4.3. Class Constructors and Destructors 334
6-4.4. Specific Operators in C++ 335
6-4.5. Input / Output in C++ 335
6-4.6. Inheritance in C++ 337
6-4.7. Polymorphism in C++ 340
6-4.8. Abstract Classes in C++ 342
6-4.9. Operator Overloading 342
6-4.10. Friend Functions in C++ 344
6-4.11. Generic Types (Templates) in C++ 344
6-4.12. Additional Notes about C/C++ 347
6-4.13. Common Problems in C/C++ 347
6-4.14. C++11 347
6-5. Programming under Windows 348
6-5.1. Windows Messaging System 348
6-5.2. Writing Windows DLL in C/C++ and Assembly 351
6-6. Writing Assembly Blocks inside C/C++ Programs 352
6-6.1. The _asm Keyword in Visual C/C++ 352
6-6.2. Using C or C++ Symbols in_asm Blocks 353
6-6.3. Writing Functions with Inline Assembly 353
6-6.4. Accessing C or C++ Data in__asm Blocks 354
6-6.5. Jumping to Labels in Inline Assembly 356
6-6.6. Calling C-Functions in Inline Assembly 347
6-6.7. Calling C++ Functions in Inline Assembly 348
6-6.8. Interrupts in Inline Assembly 358
6-7. Java-Programming Language (Summary) 360
6-8. Java versus C++ (Comparison) 405
6-9. Java versus C# (Comparison) 406
viii
Subject Page
6-10. Invoking Assembly Language Programs from Java 408
6-11. Summary 415
6-12. Problems 417
6-13. References 420
CH7: Memory Interfacing with x86 Microprocessors 421
7-1. Introduction 423
7-2. Bus Timing of Memory Read/Write Operations 425
7-2.1. Memory Read Timing 425
7-2.2. Memory Write Timing 426
7-2.3. Wait States in 80x86 Microprocessors 427
7-2.4. Pentium Processor Bus Timing 428
7-2.5. Bus Cycle Time & Bus Bandwidth of 8-x86 Processors 428
7-3. Memory Address Decoding 430
7-4. ROM Interfacing 431
7-5. RAM Interfacing (SRAM, DRAM) 434
7-5.1. SRAM Interfacing 435
7-5.2. SRAM versus Cache Memory 436
7-5.3. DRAM Interfacing (EDO, SDRAM, DDR, RAMBUS, DDR2) 439
7-5.4. DRAM Interfacing with 16-bit Data Bus 447
7-5.5. DRAM Interfacing with 32-bit Data Bus 447
7-5.6. DRAM Interfacing with 64-bit Data Bus 448
7-5.7 DRAM Modules 448
7-5.8. DRAM Controllers 452
7-6. Memory Requests 454
7-7. Checking Memory Errors 456
7-7.1. Parity Checking 456
7-7.2. Errors Checking and Correction (ECC) 457
7-8. Serial Memory Devices 458
7.9. Secondary Memory 460
7-9.1. Magnetic Storage Devices 461
i. Magnetic Tapes 461
ii. Magnetic Disk Drives 462
7-9.2. Optical Memory & Compact Disks (CD) 464
7-10. Mobile Memory Modules 466
7-10.1. SRAM Cards 466
7-10.2. Flash Memory Cards 466
7-10.3. USB Flash Memory Drives 472
7-11. Summary 475
7-12. Problems 479
ix
Subject Page
7-13. References 481
CH8: I/O Interfacing Circuits for 80x86 Microprocessors 483
8-1. Introduction (I/O Transfer Modes) 485
8-2. Methods of Addressing I/O Ports 486
8-2.1. I/O address Space 486
8-2.2. Memory-mapped I/O 487
8-3. I/O Instructions 487
8-3.1. Register I/O Instructions 487
8-3.2. Block I/O Instructions 489
8-4. Protected I/O 491
8-5. Designing I/O interfaces for 80x86 systems 492
8-5.1. Implementing Simple Input Ports Using 74LS244 Buffers 492
8-5.2. Implementing Simple Output Ports Using 74LS373 Latch 493
8-6. Using 8255 Programmable Peripheral interface (PPI) Chip. 497
Example 8-1. Basic I/O Mode 499
Example 8-2. Basic I/O Mode 501
Example 8-3. Keyboard Scanner & 7-Segment Display 502
Example 8-4. Square Wave generator (BSR Mode). 506
Example 8-5. Input from ADC 507
Example 8-6. Stepper Motor Control 509
8-7. I/O with Handshaking Capabilities 512
8-7.1. I/O with Handshaking Capabilities 514
Example 8-7. I/O with handshaking (Mode 1) 516
8-7.2. Bidirectional I/O with Handshaking Capabilities 517
Example 8-8. Bidirectional I/O with handshaking (Mode 2) 519
8-7.3. CPU Services for I/O Control 520
8-8. I/O – Memory Interface & Direct Memory Access (DMA) 521
8-8.1. The DMA Chip (8237) Architecture 521
8-8.2. How Does DMA Work? 522
8-8.3. DMA Usage in IBM PC 523
8-8.4. DMA Modes of Operation 524
8-8.5. Programming The DMA 527
8-9. I/O Processors 528
8-9-1. Features of IOP's 528
8-9-2. Intel 8089 IOP 529
8-9-3. Intel 80321 IOP 530
8-10. Summary 531
8-11. Problems 532
8-12. References 535
x
Subject Page
CH9: Interfacing with IBM PC & Compatibles 537
9-1. Introduction (Overview of the IBM PC) 539
9-2. PC Motherboard 540
9-3. Busses & Expansion Slots 544
9-4. History of PC Buses 546
9-4.1. PC (8-bit) Bus 547
9-4.2. ISA (16-bit) Bus 550
9-4.3. Proprietary Buses & their Problems 552
9-4.4. MCA & EISA (32-bit) Buses 551
9-4.5. VESA Local Bus 551
9-4.6. PCI (64-bit) Bus 553
9-4.7. Accelerated Graphic Port (AGP)PCI-X (128-bit) Bus 556
9-4.8. PCI Express Bus 558
9-4.9. IEEE-488 (GP-IB) Bus 559
9-4.10. SMBus 560
9-4.11. I2C Bus 560
9-4.12. JTAG (IEEE 1949.1) Bus Architecture 561
9-4.13. Control-area Network (CAN) 562
9-4.14. Local Interconnection Network (LIN) 562
9-4.15. Multi-Bus Architecture 564
9-4.16. Bus Hierarchy 565
9-4.17. Bus Topologies 566
9-4.18. PCMCIA & ExpressCard 568
9-4.19. PC I/O Extension Cards 568
9-5. PC Serial Ports 570
9-5.1. Introduction to Serial Communications & RS232 570
9-5-2. UART Chips 573
9-5.3. Description of the Serial Port 575
9-5.4. How Many Wires do We Need for a Serial Connection? 576
9-5.5. Addressing the Serial Port 577
9-5.6. Programming the Serial Port 577
9-5.7. Universal Serial Bus (USB) 579
9-4.8. USB to RS232 Interface 582
9-5.9. Other Serial Bus Standards (CCESS.bus, Fire Wire, IrDA) 583
9-5.10. PC-to-PC Communication (Networking & Ethernet) 586
9-5.11. Switching Networks 591
9-6. PC Parallel Ports 593
9-6.1. Parallel Port Architecture 594
9-6.2. IBM-PC Parallel Port Cable 595
9-6.3. Parallel port I/O Addressing 596
9-6.4. Parallel Port Timing Diagram 597
xi
Subject Page
9-6.5. Programming the Parallel Port 597
9-6-6. Recent Improvements in the PC Parallel Port 599
9-6-7. Parallel Port I/O under Windows 9X, 2K, XP 601
9-7. Attaching a Mass Storage Device to IBM PC 603
9-7.1. Historical Development of ATA and IDE 603
9-7.2. Parallel ATA (PATA) Interface 606
9-7.3. Serial ATA (SATA) Interface 607
9-7.4. Comparison between ATA, SATA, SCSI and USB 608
9-8. Keyboard Interface 610
9-8.1. Keyboard Operation 611
9-8.2. Detailed Operation of the PS/2 Keyboard 614
9-8.3. Keyboard Protocol & Data Format 614
9-8.4. Keyboard Connectors 615
9-8.5. Keyboard BIOS Calls 616
9-8.6. Keyboard Interface Circuits 617
9-9. Mouse Interface 619
9-10. Video Monitor Interface Circuits 622
9-10.1. Video Adaptor Standards 622
9-10.2. Video Monitors & Connectors 624
9-10.3. BIOS Video Interface & Interrupt 10H 627
9-10.4. Graphic Processing Unit (GPU) & Graphic Accelerators 629
9-11. Summary of I/O Addresses in IBM PC & Compatibles 630
9-12. Summary 632
9-13. Problems 635
9-14. References 637
CH 10: Microprocessor Support Chips & PC Chipsets 639
10-1. Introduction 641
10-2. The 8237 DMA Controller 642
10-3. The 8250A UART Chip 643
10-4. The 8251A USART Chip 646
10-5. The 16550 UART Chip 647
10-6. The 8253 Programmable Interval Timer (PIT) 647
10-7. The 8255 Programmable Peripheral Interface (PPI) 650
10-8. The 8259 Programmable Interrupt Controller (PIC) 652
10-9. The 8279 Keyboard / Display Controller 657
10-10. The 8288 Bus controller 659
10-11. The 8289 Bus Arbiter 661
10-12. The 8275 & 6845 CRT Controllers 662
xii
Subject Page
10-13. Peripheral Universal Interface (UPI), Intel 8742 Chip 664
10-14. Graphic Processing Units (GPU) & Graphic Accelerators 666
10-15. IBM PC Chipsets 668
10-16. Case Study: Intel 82845 Chipset 669
10-16.1. North-Bridge Chipset (MCH) 669
10-16.2. South-Bridge Chipset (ICH) 670
10-17. Case Study: Intel DG965 Chipset 673
10-18. Case Study: AMD 690G Chipset 673
10-19. Case Study: Intel x58 (Core i7) Chipset 675
10-20. Case Study: Apple PC Chipsets 677
10-21. Case Study: Intel Z170 (Skylake) Chipset 679
10-22. Summary 681
10-23. Problems 686
10-24. References 689
CH 11: Microprocessor Selection Guide 691
11-1. Intel Microprocessors Selection Guide 693
11-2. AMD Microprocessors Selection Guide 700
11-3. SPARC Microprocessors Selection Guide 704
11-4. ARM Microprocessors Selection Guide 706
11-5. Processor Performance Factors 706
11-6. Benchmarks 708
11-7. Microprocessor Packages & Marking 712
11-8. Processor Sockets 713
11-9. Processor Bus Speeds 718
11-10. Processor Overclocking 718
11-11. Processor Supply Voltages 719
11-12. Processor Cooling 719
11-13. Summary 721
11-14. Problems 724
11-15. References 725
Appendices 727
Appendix A: Quick Reference of 8086/8088 Instruction Set. 729
Appendix B: Basic Instruction Set of x86 Microprocessors. 737
Appendix C: Flag Reference of X86 Instructions 797
Appendix D: Math Coprocessor (x87) Instructions 801
Appendix E: Basic Instruction Set of SPARC Processors. 809
xiii
xiv
List of Figures
Figure Figure Caption Page
Chapter 1.
Fig. 1-1 Schematic of a microprocessor system. 4
Fig. 1-2 Photograph of the Intel first microprocessor, the 4004 7
Fig. 1-3 Chronological evolution of Intel's 80x86 family 8
Fig. 1-4 Photograph of some old AMD processors 10
Fig. 1-5 Photograph of the ALTAIR 8800 computer 12
Fig. 1-6 Photograph of the IBM first personal computer (IBM PC 5150) 13
Fig. 1-7. Schematic diagram of a simple microprocessor 15
Fig. 1-8. Schematic diagram of a memory system, 18
Fig. 1-9. Flowchart of the program and the equivalent C-language code 20
Fig. 1-10. Flowchart of the microprocessor operation 24
Fig. 1-11(a) Block diagram of a hardwired controller 25
Fig. 1-11(b) Block diagram of a microprogrammed controller 26
Fig. 1-12. Architecture of a microcomputer system (Software & Hardware) 27
Fig. 1-13. Compilation and linking of high-level languages 32
Chapter 2.
Fig. 2-1 Basic architectures of microprocessors 39
Fig. 2-2 Pin-out diagram of the Intel 8086 microprocessor 40
Fig. 2-3 Architecture of the Intel 8086/8088 microprocessors 42
Fig. 2-4 Block Diagram of the 74LS374 octal latch 44
Fig. 2-5 Address de-multiplexing from address/data lines of 8086 processors 45
Fig. 2-6 Block Diagram of the 74LS245 octal tri-state buffer 45
Fig. 2-7(a) Generating control bus signals in 8086 minimum mode 46
Fig. 2-7(b) Generating control bus signals in 8088 minimum mode 46
Fig. 2-8 Illustration of the FLAGS register in Intel 8086/8088 microprocessors. 48
Fig. 2-9 Handling hardware interrupts in 8086/8088 systems, using the 8259 chip 49
Fig. 2-10 Gating the hardware Interrupts inside the 80x86 systems 49
Fig. 2-11 RESET, CLK and READY pins in 8086/8088 microprocessors 50
Fig. 2-12 Architecture of the 8087 math coprocessor 51
Fig. 2-13 Connection of the 8087 with 8088 microprocessor, in maximum mode 52
Fig. 2-14(a) Architecture of the 80286 microprocessor 54
Fig. 2-14(b) Machine status word register (MSW) in the 80286 microprocessor 54
Fig. 2-15(a) Pinout diagram of the 80386DX microprocessor 55
Fig. 2-15(b) Internal architecture of the 80386 microprocessor 56
Fig. 2-16(c) Internal registers in 80386 microprocessors. 58
Fig. 2-16(a) Structure of the EFLAGS register in 80386 microprocessors 59
Fig. 2-16(b) Structure of the CR0 control register in 80386 microprocessors 60
Fig. 2-17 Architecture of the Intel 80486 microprocessor 61
xv
Chapter 3.
Fig. 3-1(a) Basic memory models 115
Fig. 3-1(b) Basic memory operation modes in x86 processors 116
Fig. 3-1(c) Illustration of the operation modes in x86-64 processors. 118
Fig. 3-1(d) Virtual memory space in x86-64 microprocessors 119
Fig. 3-2(a) Memory segmentation in x86 systems (Real mode). 121
Fig. 3-2(b) Addressing a Memory Location inside a Segment, by adding an offset 122
Fig. 3-2(c) Addressing a memory location inside a segment 122
Fig. 3-3(a) Generation of the 32-bit address offset in 80386 and later processors 124
Fig. 3-3(b) Segmented address generation in real and protected modes 124
Fig. 3-4(a) Segment selector architecture 125
Fig. 3-4(b) Fields in a descriptor table (segment descriptor). 125
Fig. 3-4(c) Linear address generation mechanism in protected memory mode. 125
Fig. 3-4(d) Combining the 32-bit effective address with the segment selector 126
Fig. 3-5(a) One of the page directory records (translation table entry). 127
Fig. 3-5(b) Illustration of the paging mechanism 128
Fig. 3-6. Privilege levels, in 80386 (and later processors) systems 129
xvi
Chapter 4.
Fig. 4-1(a) Byte and word organization in memory, with little endian representation 158
Fig. 4-1(b) Packed and unpacked binary-coded decimal (BCD) number representation 160
Fig. 4-1(c) Fixed-point number representation (signed magnitude form). 161
Fig. 4-1(d) Floating-point number representation, as two fixed point numbers 161
Fig. 4-1(e) Floating-point number representation, in IEEE 754 format, for 32-bit 162
Fig. 4-1(f) Floating-point number representation, in IEEE 754 format, for 64-bit 163
Fig. 4-1(g) Floating-point number representation, in IEEE 754 format, for 80-bit 163
Fig. 4-2 Fundamental instruction format for x86 microprocessors 164
Fig. 4-3 Basic addressing modes of the x86 microprocessors 165
Fig. 4-4 Sequence of operations for executing the instruction: MOV AX,[BX+3]. 166
Fig. 4-5 BCD to 7-segment code translation 175
Fig. 4-6 Sign extension and zero extension, from 1 byte to 2 bytes 178
Fig. 4-7(a). Illustration of MUL BX instruction, where BX contains 0100 180
Fig. 4-7(b) Illustration of MUL and DIV instructions, with different operand sizes 181
Fig. 4-8. Shift and Rotate operations in 80x86 microprocessors 184
Fig. 4-9. String transfer (MOVS) or comparison (CMPS) operations, when DF = 0 186
Fig. 4-10. Schematic representation of simple and nested subroutine calls 193
Fig. 4-11. Schematic representation of interrupt vector table 197
Fig. 4-12(a) IBM PC memory map after boot-up process 202
Fig. 4-12(b) IBM PC layered architecture 205
Fig. 4-13. Architecture of the interrupt descriptor table registers 208
Fig. 4-14. Debug registers in 80386 and higher processors 209
Fig. 4-15(a) Instructions format of x86 processors (16-bit instructions). 217
Fig. 4-15(b) Instructions format of x86 processors (32-bit instructions). 217
Fig. 4-16 Encoding the MOV instruction of x86 processors 221
Fig. 4-17 Instruction formats of SPARC processors: Format 1,2,3 228
xvii
Chapter 6.
Fig. 6-1 Piece of a C++ code and its equivalent binary code 293
Fig. 6-2(a) Operating system layered structure of an IBM PC, 295
Fig. 6-2(b) Operating system layered structure, in old and recent PC’s. 296
Fig. 6-3 Representation of an array of data elements in C-Language 300
Fig. 6-4 Main components of objected-oriented programming technology 316
Fig. 6-5 Block diagram of a typical Windows application program 333
Fig. 6-6 Compilation and interpretation of Java programs 344
Fig. 6-6 Representation of an array of ten data elements 344
Fig. 6-7 Class hierarchy in java.lang package 372
Fig. 6-8 GraphicObject Class hierarchy 387
Fig. 6-9 Accessing JNI functions 393
Fig. 6-10 Windows MessageBox called from Java 398
Chapter 7.
Fig. 7-1 Memory organization of a computer system 407
Fig. 7-2 Primary, secondary and tertiary memory devices 408
Fig. 7-3 Memory interface circuit to the Intel 8088 microprocessor in IBM PC 409
Fig. 7-4(a) Timing diagram of memory read cycle in 8086/8088 microprocessors 410
Fig. 7-4(b) Timing diagram of memory write cycle in 8086/8088 microprocessors 411
Fig. 7-5 Timing diagram of memory read cycle in Pentium microprocessors 412
xviii
xix
Chapter 8.
Fig. 8-1 I/O interfacing in a microprocessor system 467
Fig. 8-2(a) Implementation of an input port using the 74LS244 octal buffer 473
Fig. 8-2(b) Implementation of a simple input port using 74LS244 and 8 dip switches 474
Fig. 8-3(a) Implementation of an output port using the 74LS373 octal latch 475
Fig. 8-3(b) Implementation of a simple output port using 74LS374 and 8 LED’s 475
Fig. 8-3(c) Implementation of an output port using octal latch and 7-segment display 476
Fig. 8-3(d) Output port with7segment circuits and a BCD-to-7segment decoder 477
Fig. 8-3(e) Output driving circuits 477
Fig. 8-4 Pin-out diagram of the 8255 PPI (or PIO) chip. 478
Fig. 8-5 Control register word of the 8255 PIO chip. 479
Fig. 8-6 Connecting the 8255 PIO chip with a microprocessor busses 481
Fig. 8-7 Connecting the 8255 with the microprocessor busses 482
Fig. 8-8(a) Implementation of a 16-key keypad interface, using the 8255 PIO chip 484
Fig. 8-8(b) Flowchart of the KEY procedure 485
Fig. 8-9 Schematic of the square wave generator by the 8255 in BSR Mode 488
Fig. 8-10 Implementation of an ADC interface, using the 8255 PIO chip 489
Fig. 8-11(a) Schematic of the stepper motor driver circuit 490
Fig. 8-11(b) Schematic of the stepper motor driver circuit and the layout of ULN2003 491
Fig. 8-11(c) Cross section of a permanent magnet bipolar stepper motor 492
Fig. 8-12(a) Schematic representation of data transfer using strobing mechanism 493
Fig. 8-12(b) Schematic representation of data transfer with handshaking 494
Fig. 8-12(c) Data transfer from CPU (via PPI) to Output device with handshaking 494
Fig. 8-12(d) Data transfer from an Input device to CPU (via PPI) with handshaking 495
Fig. 8-13 Handshaking signals of the 8255 PIO chip 496
Fig. 8-14 Timing diagram of the 8255 PIO chip in mode 1 497
Fig. 8-15 Data output from the 8255 PIO chip to a line Printer with Handshaking 498
Fig. 8-16 Bidirectional I/O operation (Mode 2) of PA with handshaking 499
Fig. 8-17(a) Pin-out diagram of the Intel 8237 DMA controller 502
Fig. 8-17(b) Architecture of the Intel 8237 DMA controller 503
Fig. 8-18 Connection of the Intel 8237 DMA controller with the 8088 505
Fig. 8-19 Connection of an I/O Processor (IOP) to a host CPU, via a local bus 510
xx
Chapter 9.
Fig. 9-1 Overview of an old desktop IBM PC 519
Fig. 9-2 General block diagram of an IBM PC, with peripheral devices. 520
Fig. 9-3(a) Motherboard of the first IBM PC (1981) and itsschematic layout. 521
Fig. 9-3(b) Motherboard of a Pentium-based IBM PC.. 522
Fig. 9-3(c) Intel motherboards: DG965 for Intel Core2 Duo microprocessor. 522
Fig. 9-3(d) Intel motherboards. for Intel Core i7 microprocessor, 523
Fig. 9-4 Installing an expansion card into an expansion slot. 525
Fig. 9-5 Different standard shapes of expansion cards 526
Fig. 9-6 Overview of the S-100 interface bus and an card 527
Fig. 9-7(a) Overview of the ISA interface bus and an ISA card 528
Fig. 9-7(b) Description of the ISA bus. 529
Fig. 9-8 Overview of the MCA bus and the EISA bus. 531
Fig. 9-9(a) Overview of the VESA local bus (VLB). 531
Fig. 9-9(b) Details of the VESA local bus (VLB). 532
Fig. 9-10(a) Pins of the original 32-bit PCI bus 534
Fig. 9-10(b) I/O Pins and corresponding signals of the 32-bit PCI bus 534
Fig. 9-11 Different shapes of PCI slots and cards. 536
Fig. 9-12(a) Illustration of the AGP slot on the motherboard 537
Fig. 9-12(a) AGP Architecture and its connection to the PCI bus 537
Fig. 9-13 Slots of the PCI Express bus 531
Fig. 9-14 GPIB bus and how it connects devices to a PC, via a GPIB cable 540
Fig. 9-15 Simplified model of I2C bus 541
Fig. 9-16 Simplified model of the JTAG bus 541
Fig. 9-17 Simplified model of the CAN bus 542
Fig. 9-18 PC expansion slots. 543
Fig. 9-19 Different busses and I/O devices in the IBM PC 544
Fig. 9-20 Bus topologies 545
Fig. 9-21 Circuit diagram and layout of an ISA bus extension card 547
Fig. 9-22 Transmission of serial data, through serial ports. 549
Fig. 9-23 OSI model of a computer communication system 550
Fig. 9-24 Serial data frame 551
Fig. 9-25 Connection of the 8250 UART with a MODEM, for serial communication 554
Fig. 9-26 DB9 serial cable interface 555
Fig. 9-27 Typical serial cable connection 556
Fig. 9-28 USB connectors 559
Fig. 9-29 USB cables lines 560
Fig. 9-30 USB 3/0 male plugs 561
xxi
xxii
xxiii
List of Tables
Table Table Title Page
Chapter 1.
Table 1-1 First Intel microprocessors 7
Table 1-2 First Motorola microprocessors. 9
Table 1-3 First Zilog microprocessors. 9
Table 1-4 First AMD microprocessors. 10
Table 1-5 First SPARK microprocessors. 11
Table 1-6 Instruction set of a simple microprocessor. 21
Table 1-7 Unix and its variants 30
Chapter 2.
Table 2-1 Pin assignment of 8086 microprocessor, in the minimum mode 41
Table 2-2 BHE pin signals 43
Table 2-3 Control signals of the 8086/8088, in minimum and maximum modes. 44
Table 2-4 Pin assignment of the 80386 microprocessors. 56
Table 2-5 Special registers in the 80386 microprocessors 60
Chapter 3.
Table 3-1 Segment registers and their typical offsets in x86 microprocessors 121
Table 3-2 Register usage in legacy and 64-bit operation modes 124
Table 3-3 ASI values for different addressing spaces in SPARC processors 148
Chapter4.
Table 4-1 Instruction set of x86 microprocessors, arranged in alphabetical order. 167
Table 4-2 Variant data transfer instructions, in x86 microprocessors 173
Table 4-3 Stack instructions, in x86 microprocessors 177
Table 4-4 Arithmetic instructions, in 80x86 microprocessors 179
Table 4-5 Logic instructions, in x86 microprocessors 183
Table 4-6 String instructions in x86 microprocessors 185
Table 4-7. Program control instructions in 80x86 microprocessors 188
Table 4-8, Processor control instructions in 80x86 microprocessors 189
Table 4-9. Privilege instructions in x86 microprocessors 191
Table 4-10 Floating point instructions format of x87 coprocessors 192
Table 4-11. First 10 exceptions and interrupts in x86 systems 196
Table 4-12 Summary of main Interrupt vectors, in IBM PC & compatible computers 204
Table 4-13 Undocumented instructions of the 80x86 processors 216
Table 4-14 REG field bits, in the addressing mode byte of x86 instructions 218
Table 4-15 MOD field bits, in the addressing mode byte of x86 instructions 219
Table 4-16 R/M field bits in the addressing mode byte of 80x86 instructions 219
Table 4-17 Effective calculation time in x86 processors (with no pipelining) 224
xxiv
Chapter 5.
Table 5-1 DEBUG program Instructions. 246
Table 5-2 Different possible formats of an assembler line 252
Table 5-3 Summary of the x86 macro assembler directives and pseudo-ops 255
Table 5-4 Summary of the most famous macro-assemblers, for x86 processors. 285
Chapter 6.
Table 6-1 Variable types in C-language 299
Table 6-2 Arithmetic operators in C-language 301
Table 6-3 Logical operators in C-language 301
Table 6-4 Relational operators in C-language 304
Table 6-5 Bit-level operators in C-language 302
Table 6-6 Escape sequences and string format identifiers in C language 308
Table 6-7 Data conversion specifiers in C language 309
Table 6-8 File I/O modes in C-language 311
Table 6-9. Reserved constants for file search (fseek) in C-language 312
Table 6-10. Standard I/O streams in C language 313
Table 6-11. Standard I/O streams in C++ language 320
Table 6-12. Access rights and inheritance in C++ 321
Table 6-13. Brief list of important messages of Windows operating systems 334
Table 6-14 Basic data types in Java 345
Table 6-15 Java operators 347
Table 6-16 Converters and flags used in TestFormat.java 385
Chapter 7.
Table 7-1 Pin assignment of the 2716 EPROM 417
Table 7-2 Pin assignment of the 41256 DRAM 427
Table 7-3 Bandwidth of most well-known DRAM types and their peak value 429
Table 7-4 Memory bank selection, in 16-bit data bus PC systems 430
Table 7-5 Specifications for SDRAM (DDR and DDR2) modules 434
Table 7-6 List of the most famous flash memory cards 453
Table 7-7 Pin assignment of the 2 MB Flash memory HY27HU08AG 455
Table 7-8 Summary of memory technologies and their applications 460
Chapter 8.
Table 8-1 Port selection map of the 8255 PIO chip 479
Table 8-2 Stepping modes of a stepper motor 492
Table 8-3 Direct memory access (DMA) channels usage 504
xxv
Chapter 10.
Table 10-1 Intel microprocessor support chips 614
Table 10-2 Registers of UART chips 610
Table 10-3 Bits of the Interrupt Enable Register (IER) 617
Table 10-4 Interrupts in the IBM PC 625
Table 10-5 Interrupt Requests (IRQs) in a recent IBM PC 628
Table 10-6 Input status and output control signals of the 8288 chip. 631
Table 10-7 Description of pins of the 8742 638
Table 10-8 List of recent PC chipsets and their characteristics 646
Chapter 11.
Table 11-1 INTEL Processors, from 1971 to 2008. 660
Table 11-2a Examples of Intel Mobile Processors 661
Table 11-2b Intel Desktop Processors. 662
Table 11-2c Intel Corei7 Desktop Processors 662
Table 11-2d Intel Laptop Processors 663
Table 11-2e Intel Laptop Processors (Cont) 664
Table 11-2f Intel Workstations and Server Processors. 664
Table 11-2g Intel Workstations and Server Processors (Cont.) 665
Table 11-3 AMD Processors, from 1975 to 2008. 666
Table 11-4 SPARC Processors from 1987 to 2008. 670
Table 11-5 Benchmarks of different processors 674
Table 11-6 Intel CPU sockets 681
Table 11-7 AMD CPU sockets. 681
xxvi
PREFACE
Introduction to Microprocessors & Interface circuits deals with the general
principles of microprocessor design and interfacing by looking at the famous
x86 microprocessors (from Intel and AMD) and their associated peripheral
interface chips. For the matter of comparison, I also briefly introduced the
architecture of ARM and SPARC processors. My goal, from this book is
first educational. The book aims to give the electrical engineering students a
general understanding in microprocessor system design and programming
techniques. The architecture, operation and programming of x86 micro-
processors as well as their interface circuits are all covered in a didactic
manner along this book. Particularly, we handle the x86 microprocessors,
which are certainly worthy of study. In fact, the generic term x86 refers to
the instruction set of the most commercially successful CPU architecture in
the history of personal computing. The x86 assembly language is depicted in
order to emphasize the sequence of operations and their implications on the
hardware. We look at important concepts such as address decoding, memory
and input/output devices interfacing as well as data communication and
handshaking mechanism. Furthermore, we look at the PC architecture and
operation. This should enable the student to enter the workplace with
microprocessor design skills, and an understanding of microprocessor-based
applications. Assembly programming is satisfactorily explained in the
course. My choice has been to present a large set of assembly language
examples, which illustrate the various design options and possibilities, both
in instruction sets and in overall configurations. I make use of the DEBUG
utility to show what action the instruction performs, and then provides
sample assembly programs to show its application. The given examples are
actual programs, taken from the technical literature and manuals, offering
students a fun, hands-on learning experience. The knowledge of assembly
language will help the reader write better programs, even with high-level
languages, such as C, C++ and Java. The book covers, in eleven chapters
and seven appendices, the hardware and software design issues, which are
needed for building a complete microprocessor-based system. However, this
is not a dedicated book about the assembly language. It is not a dedicated
book about the PC, neither. Rather, it is a one-stop source on
microprocessors that uses an easy-to-understand, step-by-step approach to
teaching the fundamentals of assembly language programming and the PC
architecture.
xxvii
The learning objectives are stated at the beginning of each chapter. These
learning objectives serve as a preview of the information the student is
expected to learn in the chapter. Each chapter is appended with summary
and ample problems to test the student understanding. The questions are
based on the learning objectives. On successful completion of this book, the
student will be able to:
Analyze the performance of a microprocessor-based system and assess
the contribution of each part to overall system performance.
Describe and analyze the effect of hardware limitations on the
performance of the microprocessor system and appreciate how the
designer can overcome such limitations.
Predict the way in which trends in microprocessor architecture and
peripheral design will be incorporated in the next generation of
microprocessor systems.
I have goals for the book in addition to the educational ones. I think the book
can serve as a useful reference for the practicing electronics & computer
engineers. Behind the goal of the book as a guide for the computer designer,
lays the feeling that the field of computer engineering needs to develop a
sense of history and of looking to the past for guidance. The fantastic
advance in basic logic technology -in speed, cost, and reliability- makes each
day seem an absolutely new one. Thus, we have the goal of saving some of
the past and let our engineers catch the technology train for our future needs
xxviii
in the computer industry. This goal is mixed with a certain archival feeling.
We are all trying to increase the productivity of creative work of our society
in general and of our engineers in particular, by arming them with necessary
tools to conquer the computer (hardware and software) design arena.
xxix
Preamble
This book refers to many public domain web sources and text books, in the
field of microprocessors and assembly programming. We don’t claim any
major original scientific contribution in this book. In fact, almost all the
information in this book can be found elsewhere in so many public domain
resources. For the matter of recognition, all these resources and references
are collected at the end of this book. In fact we intended to write the book in
a friendly manner, as an introductory text book, and we intentionally didn’t
mean it to look like a scientific paper in a specialized journal. The footnotes
are restricted to the explanation of the technical idioms, whose explanation
may deviate the attention of the reader, if they were inserted in the main text.
However, a great effort has been exerted – throughout several years of hard
work- such that the huge amount of information contained in this book, is
arranged and realized, in a didactic manner. An additional effort has been
exerted to supplement all this information with pedagogic discussion,
illustrating figures, and graded-in-difficulty solved examples.
xxx
xxxi
Introduction to
Microprocessors
Contents
Introduction to
Microprocessors
Data Bus
Read Read
EU BIU Write Write
Address Bus
Control Bus
RD/WR
CLOCK
Fig. 1-1. Schematic of a microprocessor system. The central processing unit (CPU) is
generally composed of two units: bus interface unit (BIU) and execution unit (EU).
4
The control unit (CU) is the traffic cop of any microprocessor. The CU
implements the microprocessor instruction set. The CU handles the order
of execution of programs and fetches and decodes the instructions to be
executed. Based on the bit combinations of the instructions the CU
moves data around the microprocessor and sends the necessary signals to
the ALU to perform the operation needed. After instructions are executed
the CU sets various signals, called flags that indicate the status of the
microprocessor and execution of instructions. More advanced CU's can
respond to unplanned events inside and outside of the microprocessor
through interrupts. The interrupts cause the CU to invoke special
programs to deal with these events. More advanced CU's use pipelining
to handle the execution of multiple instructions at once, pre-execute
instructions, and predict jump sequences.
1
The 4004 was the first CPU on a single chip. It had the same computing performance of the first large
scale digital computer ENIAC. However, the CPU of ENIAC was built with thousands of vacuum
tubes, on an area of about 180 m2, while the 4004 was built on a single chip of a few mm2.
The Intel 4040 (1972) was an enhanced version of the 4004, adding 14
instructions, 8K program space, larger stack (8 level), and interrupt
abilities (including shadows of the first 8 registers). The Intel 8008 was
the first 8-bit microprocessor. It was introduced in 1974 and contained
twice the same power of Intel 4004. The Intel 8008 microprocessor was
able to perform 50,000 instructions per second.
Fig. 1-2. Photograph of the Intel first microprocessor, the Intel 4004, and the latest
Intel Corei9-X series
The Intel 8080 was the first general-purpose 8-bit microprocessor. It was
introduced in 1974 and was ten times faster than the 8008. The 16-bit
architecture allowed the 8080 to access up to 64 kB of memory. So, it
was utilized in the first microcomputer kits, like Altair and IMSAT. Intel,
with Hewlett-Packard, developed a generation of processors with 64-bit
architecture called IA-64 (the older 80x86 design was renamed IA-32).
The following chronological list depicts the microprocessors that
followed 8080. Also figure 1-3 depicts the chronological evolution of the
Intel microprocessors.
Processor Year
8086 (16bit data, 20bit address), 29000 transistors, 5MHz [1979]
8088 (8bit data, 20bit address), first IBM PCs [1981]
80286 (16bit data, 24bit address), 134000 transistors, 6MHz [1982]
80386, (32bit data, 32bit address), 275000 transistors, [1985]
80486 (32bit data, 32bit address), 1.2 million transistor, [1989]
Pentium (64 bit, 32bit address), 3.1 million transistors, 60MHz [1993]
Pentium III (64bit, 32 address), 9.5 million transistor, 450MHz [1999]
Pentium 4 (64 bit, 32bit address), 42 million transistors, [2000]
1.5GHz
Xeon (64 bit, 32bit address), [2001]
Pentium M (64 bit, 32bit address), [2002]
Itanium (64 bit), 25 million transistors, [2003]
The first IA-64 implementation was named Itanium and was intended to
be compatible with the 80x86. Itanium is a VLIW (very long instruction
word) machine and can handle six instructions simultaneously. The
lineup of Core processors includes the Intel Core i3, Core i5, Core i7, and
Core i9, as well as the X-series of Intel Core CPUs. As of 2021,
the x86 architecture is used in most high end computers, including cloud
computing, servers, workstations, personal computers and laptops.
The thing that really made the Z-80 popular in designs was the built-in
memory interface. The Z-80 CPU generated its own RAM refresh signals,
which meant easier design and lower system cost. This was a deciding
factor in its selection for the Radio Shack microcomputer TRS-80.
Although Zilog made several attempts to move off the Z80 onto more
powerful 16-bit (Z800, Z8000) and 32-bit (Z80000) platforms, other
companies were offering CPU's in this performance range years earlier,
and the Zilog chips never caught on.
Processor Year
Z80, (8-bit data, 16-bit address), [1980]
Z8000, (16-bit data, 24-bit address), [1982]
Z80,000, (32-bit data, 24-bit address) [1986]
Processor Year
Am2900, 4-bit slice microprocessor [1975]
AMD 29000 series, aka 29K [1987-1995]
x86 processors second-sourced, under contract with Intel [1979-1991]
Amx86 series [1991–95]
K5 series [1995]
K6 series [1997-2001]
K7 series [1999-2003]
K8 series [2003-now]
K10 series [2007-now]
10
Other companies introduced their own versions of Intel’s 8080, like the
National Semiconductors IMP-8, and the Fairchild F-8. However, only
Intel and Motorola continued to create and improve new microprocessors.
Zilog has rather concentrated on microcontrollers, which are complete
microcomputer systems on a single chip. On the other hand, Texas
Instruments has concentrated on the production of RISC processors
(SPARC) and digital signal processors (DSP’s).
13
2
Altair was developed by Edward Roberts, William Yates and Jim Bybee
14
Another early innovation was by Jobs and Wozniak with their invention
of the Apple II. This simple microcomputer used a 6502 processor, rather
than the M6800 (don't ask why!). It had a ROM BIOS-based operating
system3, and a BASIC program interpreter. The latter made it very easy to
operate, by supplying an easy method for programming and controlling it.
After the introduction of Apple II in 1977, the individual user became a
new target in the computer industry. IBM, the major computer
manufacturer at that time, needed to react quickly. Thus, IBM decided to
develop the Personal Computer. IBM outsourced the production of
microprocessor chips to Intel and the operating system to Microsoft.
Fig. 1-9. Photograph of the IBM first personal computer (IBM PC 5150).
3
ROM stands for Read-Only Memory and BIOS stands for Basic Input/Output System.
15
Earlier microprocessors were based around the idea that making the CPU
supporting a larger number of advanced and complex instructions would
lead to increased performance. This idea is at the root of CISC systems.
The Intel x86 and the Motorola 680x0 are CISC systems. For instance,
the Intel x86 processors have more than three hundred instructions. On
the other hand, RISC systems are characterized by small set of short
primitive instructions with fixed length, from which a computer
programmer can build more complex routines and programs. The
SPARC, PowerPC, MIPS DEC Alpha and ARM have RISC
architecture Such RISC systems are sometimes referred to as load/store
systems. The so-called single-instruction computers (SIC) are extreme
cases of RISC machines whose instruction set is reduced to minimum.
The CISC machines have been becoming more and more complex with
each new generation. For instance, the x86 processors vary in length from
one to over a dozen of bytes. This increases the functionality of such
processors and enables their compatibility with older generations. RISC
design techniques offer power in even small sizes, and thus has become
dominant for low-power 32-bit CPUs. As of 2007, the x86 designs are as
fast as the fastest available RISC solutions.
4
MS-DOS stands for Microsoft Disk Operating System.
16
Let's assume that both the address and data buses are 8-bits wide in this
example. Here are the components of this simple microprocessor:
Although they are not all shown in the above diagram, there are control
lines from the instruction decoder to perform the following functions:
Coming into the instruction decoder are the bits from the test register and
the clock line, as well as the bits from the instruction register.
i) Code segments
ii) Data segments
iii) Stack segments
The code segments hold the main program code, while the data segments
hold data. The stack segments hold the necessary parameters to handle
Hexadecimal numbers are base-16 numbers which are represented by a string of hexadecimal digits
followed by the character H. A hexadecimal digit is a character from the set (0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
A, B, C, D, E, F). In some cases, a leading zero is added if the number would otherwise begin with one
of the digits A-F. For example, 0FH is equivalent to the decimal number 15.
20
21
If you are familiar with C-programming language, then you know that the
following simple piece of C-code, in figure 1-11, will calculate the
factorial of 7 (factorial 7 = 7! = 7* 6 * 5 * 4 * 3 * 2 * 1 = 5040). Here,
we declared two integer variables, namely: a and F. At the end of the
program execution, the variable F will contain the factorial of 7.
22
}
a=a+1 End
Fig. 1-12. Flowchart of the program and the equivalent C-language code.
11 SAVEC 129
12 LOADA 128 ; a = a+1;
13 CONB 1
14 ADD
15 SAVEC 128
16 JUMP 6 ; loop back to compare ( if a >7 condition)
17 STOP
So now the question is, "How do all of these instructions look in ROM?"
Each of these assembly language instructions must be represented by a
binary number. For the sake of simplicity, let's assume that each assembly
language instruction is given a unique code (or number), like this:
Table 1-6. Instruction set of a simple microprocessor.
The above code numbers are usually called opcodes. Our little program,
in ROM would look like the second column (Opcode/Value) in the
following list (in binary form, without comments):
You can see that the internal seven lines of C-code became 18 lines of
assembly language, and that became 32 bytes in ROM.
The instruction decoder needs to turn each of the opcodes into a set of
signals that drive the different components inside the microprocessor.
Let's take the ADD instruction as an example and see what it needs to do:
25
1. During the first clock cycle, we need to fetch (or load) the
instruction. Therefore the microprocessor needs to:
activate the tri-state buffer for the program counter (PC)., to point
to the instruction (ADD) location in memory,
activate the RD line,
activate the data-in tri-state buffer,
read and latch the instruction into the instruction register.
2. During the second clock cycle, the ADD instruction is decoded
(and executed). It needs to do very little:
set the operation of the ALU to addition,
latch the output of the ALU into the C register.
3. During the third clock cycle, the program counter (PC) is
incremented.
Start
Are there N
any instruction
waiting ?
Y
Fetch instruction
Execute
instruction
N N
HALT Are there any
instruction ? I/O interrupts
waiting ?
Y Y
Hardwired control.
Microprogramming.
There are many ways in which the above microprocessor design can be
improved. For instance, the registers A, B and C may be replaced by a
larger set of general-purpose registers. Also, the microprocessor can be
designed to handle input/output (I/O) interrupt events that may occur
during the program execution and to transfer control to their service
subroutines. The device requesting an interrupt can identify itself by
sending a special code to the CPU over the bus. The code, supplied by the
interrupting device, may represent the starting address of an interrupt
service routine (ISR).
28
CPU
BIOS
ROM
29
MAC-OS from Apple, OS/2 and OS/400 from IBM.6 Once upon a time,
not so long ago, everyone knew what an operating system was. It was the
complex software sold by the maker of your computer system, without it
no other programs can function on the computer. Its duties included:
spanning the disks, monitoring the terminals, and generally keeping track
of what the hardware is doing. An application (user) programs were
asking the operating system to perform various functions; and users were
seldom talking to the OS directly. Today those boundaries are not quite
so clear. The rise of graphical user interfaces, macro and scripting
languages, and the increased popularity of networks --all of these factors
have blurred the traditional distinctions. Today's computing environments
consist of layers of hardware and software that interact together to form a
wowing whole.
6
All these names are trademarks of their corresponding companies
30
1. Process Management
o Process is a program in execution
o Process creation/deletion (book-keeping)
o Process suspension/scheduling,
o Process synchronization
o Process communication
2. Memory Management
o Maintain bookkeeping information
o Map processes to memory locations
o Allocate/de-allocate memory space as requested/required
I/O Device Management
o Disk management functions such as free space management,
storage allocation, fragmentation removal, head scheduling
o I/O device interface through buffering/caching, custom drivers for
each device.
File System (Built on top of disk management)
o File creation/deletion.
o Support for hierarchical file systems
o Update/retrieval operations: read, write, append, seek
o Mapping of files to secondary storage
Protection (Controlling access to the system)
o Resources --- CPU cycles, memory, files, devices
o Users --- authentication, communication
Network Management (Often built on top of file system)
o TCP/IP, IPX, IPng
o Connection/Routing strategies
o Communication mechanism
o Data/Process migration
Network Services (Built on top of networking)
o Email, messaging (Exchange)
o FTP
o www and gopher,
o Distributed file systems --- NFS, AFS, LAN Manager
o Name service --- DNS, NIS
o Security --- kerberos
User Interface
o Character-based shells such as sh, and command.com
o GUI --- XWindows, Win32
31
1-9.1. UNIX
UNIX (or UNICS7) was initially developed in 1969 and released in 1971,
by the AT&T engineers Ken Thompson and Dennis Ritchie to run on the
DEC PDP-7. When Thompson went to University of California at
Berkeley (UCB) to teach for a year, one of his students (Bill Joy)
developed the em editor and the first release of Berkeley Software
Division (BSD) was released in 1977. UNIX BSD was licensed to several
companies and further editions of UNIX followed (see table 1-2). For
instance, SCO developed its first Unix system called SCO XENIX
System V for Intel x86 processor-based PCs. Also, Sun Microsystems
developed the SUNOS, which was later renamed SOLARIS.
7
UNICS stands for UNiplexed Information and Computing Service, and changed later to UNIX
32
1-9.2. MS-DOS
The Microsoft Disk Operating System (MS-DOS) has been the most
popular PC operating system for about two decades. In 1980 IBM
commissioned Bill Gates to produce an operating system for their new
PC. Bill Gates was known to IBM because he had written a version of the
BASIC language for the Intel 8080-based Altair PC. Because IBM’s
original PC had only 64K bytes of RAM and no hard disk, a powerful
operating system like UNIX could not be supported. Bill Gates did not
have time to develop an entirely new operating system, so his company,
Microsoft, bought an operating system called 85-DOS from Seattle
Computer Products and renamed it 86-DOS. This product was then
modified, by doping it with some flavors from CP/M (from Digital
Research) and renamed MS-DOS. The first version of MS-DOS, was
released in 1981. MS-DOS Version 1.0 occupied 12K bytes of memory
and supported only a 160 KB in diskette.
1-9.3. Windows
The need to make computers accessible to those who want to employ
them as a tool, forced the development of graphical user interfaces (GUI)
like Windows. The first version of Windows appeared in November
1985. However, at least until version 3.11 (1994), Microsoft’s Windows
was not an operating system, but a front-end GUI. The first versions of
Windows enabled users to switch among several concurrently running
applications. The product included a set of desktop applications, such as a
file manager, notepad, calculator, clock, and communications programs
33
Windows NT (new technology) was released in July 1993, and was the
first Windows operating system to combine support for client/server
business applications. However, Windows NT shared so many features of
the IBM OS/2, that appeared almost in the same time.
Figure 1-17 indicates the different steps, which are needed to translate a
high-level source program file (e.g., C-language source files *.C) into an
executable file (*.EXE).
35
The optimized RTL is then fed to the code generator, which produces
target object (binary) code.
36
1-11. Summary
The first single chip CPU (which has been called a microprocessor) was
the Intel 4004 that was introduced in 1971. The Intel 8008 was the first
8-bit microprocessor. The 8088 was used in the first successful IBM PC.
Today the microprocessor is one of the most commonly used electronic
components for PC’s, communication and control systems as well as
automotive applications (some cars have over 10 of them inside).
37
1-12. Problems
1-1) Draw a block diagram showing the architecture of a simple
microprocessor and explain briefly its main components. Explain why it
is necessary to have an address bus, a data bus and a control bus in
microprocessor systems.
1-2) An MPU is:
(a) the same as a microprocessor unit.
(b) made from more than one Central Processing Unit.
(c) a small, single chip computer.
(d) an abbreviation for main processing unit.
39
1-13. References
40
Microprocessor
Architecture
Contents
41
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
42
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Microprocessor
Architecture
2-1. Introduction
The microprocessor architecture is the framework and the conceptual
design of the microprocessor structure. There are two basic architectures
of microprocessors, namely:
GND 1 40 VCC
AD14 2 39 AD15
AD13 3 38 A16 / S3
AD12 4 37 A17 / S4
AD11 5 36 A18 / S5
AD10 6 35 A19 / S6
AD9 7 8086 34 BHE / S7
AD8 8 33 MN/ MX
AD7 9 32 RD
AD6 10 31 HOLD, RQ/GT0
AD5 11 30 HOLDA, RQ/GT1
AD4 12 29 WR, LOCK
AD3 13 28 IO/M, S2
AD2 14 27 DT/ R, S1
AD1 15 26 DEN, S0
AD0 16 25 ALE, QS0
NMI 17 24 INTA , QS1
INTR 18 23 TEST
CLK 19 22 READY
GND 20 21 RESET
Table 2-1. Pin assignment of 8086 microprocessor, in the so called minimum mode of
operation. In this mode pin33 should be connected to ground.
45
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Figure 2-3 depicts the 8086 internal architecture. The Intel 8086
was based on the design of the Intel 8080 and Intel 8085 (it was
source compatible with the 8080) with a similar register set, but
was expanded to 16 bits.
The Bus Interface Unit (BIU) feeds the instruction stream to the
Execution Unit (EU) through a 6-byte prefetch queue. So fetch
and execution are concurrent (a primitive form of pipelining,
which is a method of parallel processing). In fact the 8086
instructions vary from 1 to 4 bytes. It features 64k x 8-bit (or 32k
x 16-bit) I/O ports and fixed vectored interrupts.
46
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
47
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Table 2-3 depicts the pin assignment of pins 24-31 in minimum and
maximum modes. In minimum mode, the 8086/8088 processors work as
stand-alone processors, that generate their own control signals. So, in
minimum mode the control pins 24 to 31 are used as I/O control signals
(INTA, ALE, DEN, DT/R, IO/M, WR, HOLD, HOLDA) which are
generated from the 8086 (as 8085A control signals). Thus, in minimum
mode, the 8085A peripheral support chips can be also used with the
8086/8088 microprocessors.
Table 2-3. Control signals of the 8086/8088, in minimum and maximum modes.
Function
Pin Minimum Mode Maximum Mode
31 HOLD (Hold for DMA request) RQ/GT0 (Request/Grant)
30 HLDA (Hold Acknowledge) RQ/GT1 (Request/Grant)
29 WR (Write control) LOCK (Lock bus)
28 IO/M (I/O or Memory control , 8086) S2 (Status signal)
IO/M (I/O or Memory control , 8088)
27 DT/R (Data transmit/receive) S1 (Status signal)
26 DEN (Data enable) S0 (Status signal)
25 ALE (Address latch enable) QS0 (Queue status)
24 INTA (Interrupt acknowledge) QS1 (Queue status)
Q D D Q Q D D Q
CLK CLK CLK CLK
E E E E
1 10
48
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Figure 2-7 depicts how the control signals MEMR, MEMW, IOR,
and IOW can be generated from IO/M, WR and RD signals, for
8086/8088 microprocessors in minimum mode. Note that IO/M in
8086 is replaced with IO/M in 8088. The IBM PC/XT, made use
of the 8088 in its maximum mode, while the IBM PCJR was
running its 8088 in minimum mode.
49
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The queue status lines QS0, and QS1 tell whether the 8086 is
taking its next byte from internal byte queue or from external
memory. The Ready pin is usually pulled low when the 8086 is
communicating with a slow memory or input/output devices.
When the memory or input/output device finishes the reading or
the writing operation it pulls ready high.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
- - - - OF DF IF TF SF ZF - AF - PF - CF
CF: Carry Flag (contains carry out of MSB of result)
PF: Parity Flag (indicates if result has even parity)
AF: Auxiliary carry Flag (contains carry out of bit 3 in AL)
ZF: Zero Flag (indicates if result is zero)
SF: Sign Flag (indicates if result is negative)
OF: Overflow Flag (indicates if overflow occurred in result)
IF: Interrupt Flag (indicates if interrupt is enabled or disabled)
DF: Direction Flag (controls pointer updates during string operation)
TF: Test Flag (provides single-step execution capability for debugging)
Fig. 2-8. Illustration of the FLAGS register in Intel 8086/8088 microprocessors.
The instruction pointer (IP) is a 16-bit register which contains the address
of the current executing instruction. The microprocessor uses this register
to sequence the execution of instructions. The IP and FLAGS are two
special registers on the 8086 CPU. You do not access these registers the
same way you access the other registers. Instead, it is the CPU which
manipulates these registers directly.
The 8086/88 microprocessors can handle hardware interrupts, via
INTR (interrupt request) and NMI (non-maskable interrupt) pins.
Interrupts are external events (like mouse movement or memory failure)
that need CPU attention. When NMI pin is edge triggered, the CPU will
finish its current instruction and handle the interrupt. Similarly, when the
INTR is activated, by an external device, the CPU will finish its current
instruction and respond by interrupt acknowledge signal (INTA). The
INTA signal is received by an interrupt controller (the 8259 chip), as
shown in Fig. 2-8. This chip puts an interrupt vector byte on the data bus
and the CPU uses it to determine the address of the appropriate interrupt
service routine (ISR). 52
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The interrupt signal INTR, can be masked (and hence ignored) if interrupt
flag (IF) is cleared by CLI instruction. Figure 2-9 depicts how the
microprocessor handles the hardware interrupt signals.
Fig. 2-9. Handling hardware interrupts in 8086/8088 systems, using the 8259
interrupt controller chip. The INTA signal and other control signals are generated in
the maximum mode via the 8288 bus controller.
8086
8259 INTA
INTR
IF
End of execution of
current instruction
Fig. 2-10. Gating the hardware Interrupts inside the 80x86 systems.
53
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The internal clock of the 8086/8088 is equal to one third of the input
frequency signal at pin 19 (CLK). The CLK signal is usually supplied via
the 82284 clock chip, as shown in figure 2-11. For instance, if the 8088 is
to be operated at 5MHz, then the 8284 chip should generate 15MHz. The
8284 is also used to manage READY and RESET signals of the
8086/8088.
Fig. 2-11. RESET, CLK and READY pins in 8086/8088 microprocessors and their
connection to the 8284 clock chip.
54
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The 8087 math coprocessor was one of the innovations of the x86 family
of microprocessors. The 8087 is capable of performing mathematical
floating point operations in 80-bit precision. It was designed to monitor
the instruction stream and watch for ESC sequences (instructions which
are proceeded by ESC prefix). Whenever an ESC sequence is detected,
the coprocessor knows that the following instruction involves a floating
point operation and can execute it more efficiently than the 8086/88. In
the meantime, the 8086/88 has to wait the result of the 8087, by issuing a
Wait instruction until the 8087 is done and its BUSY signal goes low.
Other (non ESC) instructions are ignored by the 8087 and normally
handled by the 8086/88.
55
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
As shown in figure 2-12, the 8087 coprocessor has eight 80-bit data
registers, named ST(0) through ST(7), which can be accessed randomly
or as a stack. The connection of 8087 with 8086/88 microprocessor in
maximum mode is shown in the following figure.
8088 8087
AD0-AD15 AD0-AD15 INT
BHE / S7 BHE
S0-S2 S0-S7
QS0-QS1 QS0-QS1
RQ/GT0 RQ/GT0
RQ/GT1
TEST BUSY
INTR RDY
RESET
Fig. 2-13. Connection of the 8087 math coprocessor with 8088 microprocessor, in
maximum mode.
56
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The 80286 (also known as iPAX 86) is a 16-bit microprocessor that was
introduced in 1982. It was built in 1 micron NMOS technology with
134,000 transistors in a 68-pin quad-flat pack (QFP) package. The 80286
features a 16-bit data bus, and a 24-bit address bus and hence can address
up to (224) 16 MB of memory space. The processor was used in IBM
PC/AT, which has been introduced in 1984. The 80286 performance was
more than twice that of its predecessors (8086 and 8088) per clock cycle.
For instance, the model operated at 12 MHz had a benchmark of 2.66
MIPS (million instruction per second). Figure 2-14 depicts the
architecture of the 80286 microprocessor. As shown, the 80286 has 5
additional control registers for segmented memory management and
multiple processing:
MSW (80286)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
- - - - PE
Fig. 2-14(c). Machine status word register (MSW) in the 80286 microprocessor.
60
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The most important change, from the programmer point of view, to the
80386 was the introduction of a 32-bit register set. The 16-bit AX, BX,
CX, DX, SI, DI, BP, SP, FLAGS, and IP registers were all extended to 32
bits. The 80386 calls these new 32-bit versions EAX, EBX, ECX, EDX,
ESI, EDI, EBP, ESP, EFLAGS, and EIP to differentiate them from their
16-bit versions (which are still available on the 80386). Besides the 32-bit
registers, the 80386 also provides two new 16-bit segment registers, FS
and GS, which allow the programmer to concurrently access six different
segments in memory without reloading a segment register.
61
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Note that all the segment registers on the 80386 are still 16 bits. The
80386 did not extend the segment registers to 32 bits. The 80386
microprocessor extended the flags register to 32 bits (renamed EFLAGS)
and defined bits 16 and 17. Bit 16 of the EFLAGS register is the debug
resume flag (RF) used with the set of 80386 debug registers. Bit 17 is the
virtual 8086 mode flag (VM), which determines whether the processor is
operating in virtual-86 mode (that simulates an 8086) or standard
protected mode. Furthermore, the 80386 has additional special registers
for control and memory management. So, in addition to the 5 special
registers of the 80286 microprocessor: GDTR, LDTR, IDTR, TR, and
MSW, the 80386 has added 16 registers, as shown in table 2-5. As
shown, the 80386 added four control registers (CR0-CR3). These
registers extend the MSW register of the 80286 (the 80386 emulates the
80286 MSW register for compatibility, but the information really appears
in the CRx registers). These registers control functions such as paged
memory management, and protected mode operation.
62
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
EFLAGS (386+)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
- NT IOPL OF DF IF TF SF ZF - AF - PF CF
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
- - - - - - - - - - ID VIP VIF AC VM RF
63
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
CR0 (386+)
31 30 29 ... 18 17 16 ... 10 9 8 7 6 5 4 3 2 1 0
PG CD NW … AM - WP … - - - - - NE ET TS EM MP PE
65
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
66
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Bus Interface Unit (BIU) retrieves code and data from RAM.
The BIU sends code along one 64-bit path to the 8kB I-Cache and sends
data along another 64-bit path to the 8kB D-Cache. The two caches
collect code and data until other components request them.
The BPU inspects the code in the I-Cache to determine which of the two
pipelines, or data paths, can most efficiently carry each instruction.
The instruction Pre-fetch Buffer obtains new instructions in 256-bit
bursts and the Instruction Decode Unit prepares the code for execution.
The FPU calculates any non-integer math and puts the result in D-Cache.
The two integer ALU's simultaneously take two sequential instructions
of up to 32 bits each from the Instruction Decode Unit.
The instructions are executed using data placed in the Execution Unit's
Registers from the D-Cache.
The D-Cache receives the results of the calculation. The Cache sends
the results to the BIU, which in turn stores the results to RAM.
The heat dissipation of first Pentium processors was about 16W and
needed to special cooling fans
67
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Instruction 1 Instruction 2
Task / Time 1 2 3 4 5 6 7 8 9 10
Cycle
Instruction Fetch IF1 IF2
Instruction Decode ID1 ID2
Operand Load OL1 OL2
Instruction IE1 IE2
Execute
Operand Store OS1 OS2
(a) Non-pipelined processing
Task / Time 1 2 3 4 5 6 7 8 9 10
Cycle
Instruction Fetch IF1 IF2 IF3 IF4 IF5 IF6 IF7 IF8
Instruction Decode ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8
Operand Load OL1 OL2 OL3 OL4 OL5 OL6 OL7
Instruction Execute IE1 IE2 IE3 IE4 IE5 IE6
Operand Store OS1 OS2 OS3 OS4 OS5
(b) Pipelined processing (with 4 pipes)
Fig. 2-19. Pipelined and Non-pipelined instruction processing.
68
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
2-8. Architecture of Intel’s Pentium II Microprocessor
Pentium II is the successor of the Pentium (80586) and Pentium Pro
(80686) microprocessors. The first produced Pentium II processors were
code named Klamath. They were manufactured using a 0.35 micron
CMOS process with about 7.5 million transistors on a single chip. The
initial versions of Pentium II supported clock rates of 233 to 300 MHz at
a bus speed of 66 MHz. The Pentium II second-generation (code-named
Deschutes) were made with the 0.25 micron CMOS technology and
supported clock rates of 350 to 800 MHz at a bus speed of 100 MHz. The
Pentium II can execute all the instructions of all the earlier x86 processor
family. There are four versions targeted at different user markets. The
Celeron processor is the simplest and cheapest version. The standard
Pentium II is aimed at mainstream home and business users. The Pentium
II Xeon is intended for higher performance business servers. There is also
a mobile version of the Pentium II for use in portable computers.
The low cost Celeron may be sold as a card only without the box.
Consumer line Pentium II requires a 242-pin slot called Slot 1. The Xeon
processor uses a 330-pin slot called Slot 2. Intel refers to Slot1 and Slot2
as SEC-242 and SEC-330 in some of their technical documentation. The
daughter-board has mounting points for the Pentium II CPU itself plus
various support chips and cache memory chips. You can find a
recapitulation of the features of all the above mentioned sockets and
interfaces, as well as their photos, in Chapter 11 of this book.
69
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Pentium III also features a multi-transaction system bus, and MMX,
like the Pentium II. It adds 70 new instructions, Dual Independent Bus
(DIB) architecture, the Intel processor serial number, internet streaming
and Single Instruction/Multiple Data (SIMD) Extensions. Some Pentium
III versions also include an Advanced Transfer Cache and Advanced
System Buffering.
The 400 MHz system bus is a quad-pumped bus running off a 100 MHz
system clock making 3.2 GB/sec data transfer rates possible.
The Micro-Operations (µOP) is the name that Intel gives to its new
instructions, which can be directly understood by the execution units of
the microprocessor. These RISC-like instructions represent very simple
instructions that can be quickly carried out by the processor. Unlike x86-
instructions, those µOPs are of a defined size and can thus easily be fed
into the execution pipeline. The decoder translates an x86-instruction into
one or many more µOPs, unless the x86-instruction is so complex. In this
case, the Micro Instruction Sequencer (MIS) has to produce a sometimes
rather long sequence of µOPs, using the Micro Code ROM (MCR). In
average, most x86-instructions get decoded to about two µOPs.
72
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Most system bus signals of the Intel Pentium 4 processor in the 423-pin
package system bus signals use the so-called Assisted Gunning
Transceiver Logic (AGTL) signaling technology. This signaling
technology provides improved noise margins and reduced ringing through
low voltage swings and controlled edge rates.
Unlike the P6 processor family, the termination voltage level for the
Pentium 4 processor AGTL signals is VCC, of the processor core. The
AGTL inputs require a reference voltage (called GTLREF), which is
used by the receivers to determine if a signal is a logic 0 or a logic 1.
GTLREF must be generated on the system board.
73
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
74
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Core architecture has 14-stages pipeline, but it can decode, fetch and
issue up to 4 instructions per clock cycle. Like recent PC processors,
Core2 translates x86 instructions (using a Code ROM) into RISC-like
short instructions (Ops). One new technology included in the Core
design is Micro-Ops Fusion, which combines two x86 instructions into a
single micro-operation. For example, a common code sequence like a
compare followed by a conditional jump would become a single Op.
Other new features include 1 cycle throughput of all 128-bit SSE
instructions and a new power saving design.
The Core i9, with up to 18 cores, is Intel's fastest consumer processor yet.
In Intel's simple terms, the Core i9 is faster than the Core i7, which in
turn is faster than the Core i5. However, "faster" is not always "better".
Many people don't need such extra power, which un-fortunately affects
battery life in laptops.
76
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Only the Core i7 and Core i9 series now support Hyper-Threading for
virtual cores. The new Core i5 series does not have it.
The AMD64 architecture has been cloned by Intel under the name Intel
64. This leads to the common use of the names x86-64 to collectively
refer to the two nearly identical implementations. Note that x86-64 is not
the same as IA-64, which is the architecture of Intel's Itanium processors.
77
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
78
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Intel Smart Cache which allows for efficient data sharing between two
processor cores,
Improved decoding and SIMD execution
Intel Dynamic Power Coordination and Enhanced Intel Deeper Sleep
to reduce power consumption
Intel Advanced Thermal Manager which features digital thermal
sensor interfaces
83
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
87
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Fig. 2-27. Comparison between the execution cycles of RISC and CISC machines.
The first ARM processor, the ARM1 was a prototype, which was never
released. The ARM2 was originally called the Acorn RISC Machine. It
was designed by Acorn Computers and used in the Archimedes. Their
successor to the BBC Micro and BBC Master models were based on the
8-bit 6502 microprocessor. It was clocked at 8 MHz giving an average
performance of 4.7 MIPS. Development of the ARM family was then
continued by a new company called Advanced RISC Machines Ltd.
The ARM3 added a fully-associative on-chip cache and some support for
multiprocessing. This was followed by the ARM600 chip which was an
ARM6 processor core with a 4kB 64-way set-associative cache, an MMU
based on the MEMC2 chip, a write buffer and a coprocessor interface.
The ARM7 processor core uses half the power of the ARM6 and takes
around half the die size. In 1994 VLSI Technology, Inc. released the
ARM710 processor chip. The subsequent ARM11 micro-architecture
represented a major step in embedded systems. By scaling both the clock
frequency and the supply voltage, the developer can control power
consumption and performance. First ARM11 processors are implemented
in 0.13µm process technology and dissipate less than 0.4 mW/MHz when
they are powered by 1.2V. Figure 2-28 depicts one of the ARM11
processors and a roadmap to over 1GHz. As shown, the ARM11
processor contains an AMBA interface, which improves memory bus
performance and facilitates "right-first-time" development of embedded
systems with multiple peripherals.
89
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
1- User mode,
2- Interrupt mode (with a private copy of R13 and R14),
3- Fast interrupt mode (private copies of R8 to R14) and
4- Supervisor mode (private copies of R13 and R14).
90
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
.
Fig. 2-29. Architecture of ARM1176JZ processor with ARM11 core.
91
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
92
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
1- User mode or
2- Supervisor mode.
93
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
In all SPARC machine there are 32 registers that the program can use at
the same time (actually, 28 of the 32 are generally available). Since the
SPARC has been optimized for subroutine calls, the actual number of
registers in the microprocessor is much larger than 32 (often 124 registers
exist), but only 32 registers are visible to the program at any given time
(see the subroutine overview section for more information). Registers
can be categorized by function since they are typically created for a given
purpose. On RISC machines, if the register is not being used for the
purpose it is designed for, it is available to use by any instruction. The
most general registers are the "temporary" registers. These registers exist
to store values loaded from memory or calculated by the ALU before
being stored in memory. These registers are not preserved across
subroutine calls. Basically, the program should use these as the "general
use" registers for the program.
The SPARC machine has only eight temporary registers designated %l0-
%l7 (registers 16-23), but the SPARC often has other registers available
that can also be used as temporaries if the program needs them. For
memory accessing and for subroutine calls, SPARC provide a few
specialized registers. Since they are keys to the operation of the machine,
the program should use the registers as designed, unless they are known
to not be needed. The SPARC provides a stack pointer register, $sp and
%sp. It also both provides a frame pointer register, $fp and %fp. Like
MIPS,
94
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
SPARC reserve one register for the constant zero. This register,
designated %g0 on SPARC, will always return zero when read from and
will not change if written to. To provide for subroutines, there needs to be
registers available for passing the arguments to the registers, returning
values from the registers and returning from the subroutine. The SPARC
machine has six registers for function arguments, designated %i0-%i5,
six registers for function return values, designated %o0-%o5, and a
function return address, designated %i7. When a subroutine call requires
more arguments than the machine provides registers for, then the program
must use the stack to save the additional information. The following
figure depicts the layout of the SPARC registers.
Fig. 2-31. Register organization of the Integer Unit (IU) of SPARC processors
96
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Fig. 2-32. Block diagram of the SPARC64 architectures (VI and VII)
98
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Since the sizes of data sets and programs continue to increase, 64-bit
addressing support became necessary, and SPARC took on this challenge.
Integer registers have become 64 bits wide; floating-point registers are
32, 64, or 128 bits wide.
99
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Actually, the T1 cores are less complex than those of high end processors
in order to allow 8 cores to fit on the same die. The UltraSPARC T1 and
T2 are designed for single CPU systems. Recent UltraSPARC processors
such as Rock (2009) support multiple chip server architectures. The most
recent commercial iterations of the SPARC processors is SPARC64 X
"Athena" introduced in 2012, and the 16 core SPARC T5 introduced by
Oracle Corporation in 2013, and running at 3.6 GHz.
101
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Fig. 2-37. Illustration of the virtual threading (VMT) and simultaneous threading
(SMT) technologies.
Thus, Intel has earned the title of the today‟s fastest x86 processors
developer. In the meantime, the AMD processors are pushed back and
started to be just a good solution for inexpensive systems. In order to
retain the sales volume, AMD undertook an unprecedented reduction of
the pricing on their solutions. For instance, the price of AMD Phenom
X4 9600 (quad core with 512 kBx4 L2-Cache at 2.3 GHz), is $145.99, as
declared on AMD official website, in 2009. The price of the
corresponding Intel Core2 Quade, running at 2.4GHz, ranges from
$184.99-$310, depending on the Cache size.
As for SPARC processors, they have failed long ago on the desktop and
still being insignificant in the overall notebook market (despite the
availability of technically impressive products), Therefore, unlike Intel
and AMD architectures, SPARC is best viewed solely as a server
processor architecture. The market prospects for all servers (not just
SPARC) is driven by the following considerations.
103
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
104
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
In the mid-1960s, the Intel chairman of the Board Gordon Moore deduced
a principle or “law” which has continued to be true for over three
decades: the computing power and the complexity (roughly, the number
of transistors per chip) of the silicon integrated circuit microprocessor
doubles every one to two years, and the cost per CPU chip is cut in half.
This law is the main explanation for the computer revolution, in which
the Intel Architectures (IA) play such a significant role.
105
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
2-20. Summary
In this chapter we described the architecture of some famous
microprocessors, with emphasis on x86 microprocessors. The jargon
computer terms, such as pipelining, threading and virtualization, which
usually appear in microprocessor datasheets and the advertisements of
CPU vendors, have been explained in a simple didactic manner.
The 80x86 is the generic name of Intel microprocessor architecture. The
generic term x86 refers to the instruction set of the most commercially
successful CPU architecture in the history of personal computing. The
Intel 8086 CPU was the first of the x86 architecture, which appeared in
1978. Three years later, the 8088 (an eight-bit data bus version of 8086),
was chosen as the main CPU for the IBM PC.
The architecture has twice been extended to larger data bus sizes. In
1985, Intel released the 32-bit 80386 to replace the 16-bit 80286. This
extension to the x86 architecture is commonly called IA-32 (Intel
Architecture, 32-bit). In 2003, AMD further extended the architecture to
64 bits, variously called x86-64 or AMD64. Intel 64 should not be
confused with the unrelated IA-64 architecture
The x86 architecture is a variable instruction length, CISC design with
emphasis on backward compatibility. The instruction set is not typical
CISC however, but basically an extended and orthogonalized version of
the simple eight-bit 8085 architecture. Words are stored in little-endian
order and 16-bit and 32-bit accesses are allowed to unaligned memory
addresses. To conserve opcode space, most register-addresses are three
bits, and at most one operand can be in memory (in contrast with some
highly orthogonal CISC designs such as PDP-11 where both operands can
be in memory), but this memory operand may also be the destination,
while the other operand, the source, can be either register or immediate.
This contributes, among other factors, to a code footprint that rivals 8-bit
machines and enables efficient use of instruction cache memory. During
execution, current x86 processors employ a few extra decoding steps to
split most instructions into smaller pieces, micro-ops (Ops), which are
readily executed by a micro-architecture that may be described as a
RISC-machine without the usual load/store limitations. The small number
of general registers (inherited from 8085) has made register-relative
addressing (using small immediate offsets) an important method of
accessing operands, especially on the stack.
106
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Much work has therefore been invested in making such accesses as fast as
register accesses, i.e. one cycle instruction throughput in most
circumstances. The following table summarizes the common architecture
steppings of the x86 processors:
107
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Intel Core (Solo and Duo) and dual-core Intel Xeon processor
are based on an improved Pentium M processor architecture.
The Intel Pentium dual-Core, Intel Core Duo, Core Quad and Core
Extreme, Intel Xeon 3x00 and 7x00 series processors are all based
on Intel Core Microarchitecture. The Intel Core2 Duo, Core2
Quad and Core2 Extreme, Xeon 5x00 series are based on
Enhanced Intel Core microarchitecture.
108
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
However, the segment registers (CS, DS, SS, ES) are still 16-bit wide for
the matter of compatibility with previous processor generations.
Multithreading: The studies showed that even under full load, a typical
x86 server CPU is idle about 50% of the time. This is due to cache misses
which all CPU architectures suffer from; they must wait for data to arrive
from RAM. However, CPUs belonging to the SPARC T1 family do not
suffer from this problem. Instead, as soon a T1 thread stalls due to a
cache miss, the T1 switches thread in 1 clock cycle and continues to do
work while waiting for the data. Typically on a modern CPU, a thread
switch takes a much longer time than 1 clock cycle.
112
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
2-21. PROBLEMS
2-1) Draw a general block diagram describing the internal architecture of
the 8086 microprocessor and explain briefly each block.
2-2) Calculate the maximum address space of the 8086 microprocessor
and the maximum number of segments that can be located in this space .
2-3) List and explain the significance and use of the general-purpose
registers in the 8086 microprocessor.
2-4) Explain how the address/data lines can be de-multiplexed in 8086
and 8088 microprocessors using octal latches like the 8282 and bus
transceivers 8286 chips.
2-12) Check the right phrases with (√) sign and false ones with (x)
[1] The early Intel x86 processors were generally CISC processors [ ]
because they made use of complex instructions sets
[2] The 8088 is a 8-bit microprocessor while 8086 is a 16-bit PU [ ]
[3] The Core 2 Duo is a dual core microprocessor, which belongs to [ ]
Intel‟s x86 microprocessors
[4] RISC processors are faster than CISC processors because they [ ]
use simpler fixed-length instructions and their architecture
enables pipelining and superscalar execution
[5] In the flat memory mode, the whole memory of a 80386 micro- [ ]
processor may be considered as one segment of 4GB
[6] The interrupt service routines are called by the CPU when IF=0 [ ]
[7] The AF is raised (AF=1) when the addition of 8-bit numbers [ ]
results in a carry
[8] The parity flag helps to correct memory errors, in x86 [ ]
microprocessor system
[9] Core2 Duo is a dual core microprocessor with shared L2-Cache [ ]
[10] The 80486 has a built-in FPU [ ]
2-13) Describe the main features of Pentium processors, with respect to
their precursors. What‟s the difference between Pentium 4 and Itanium
microprocessors?
2-14) Describe the meaning of the following terms:
Pipelining,
Super-scalar architecture,
SEC, SIMD, MMX, SSEE, SSEE2
2-15) What are the main power saving modes, which are supported in
Pentium microprocessor.
2-16) Describe the operation of the main support chips in the 8086/8088
– based microcomputer systems, and show how they‟re interfaced to the
microcomputer system. Hint: The bus controller 8288, the programmable
timer / counter PTC 8243/8244, the programmable interrupt controller
PIC 8259, the programmable peripheral interface PPI 8255
2-17) Explain the difference between a directive, an operation, and an
instruction. Give an example of each.
2-18) How are the integer registers named on the SPARC?
2-19) How many integer registers are there on the SPARC? For each of
the integer registers that have special attributes, explain the special
attributes.
114
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
2-22. Bibliography
[12] http://www.intel.com
[13] http://www.x86-guide.com
[15] ARM milestones, ARM company website. Retrieved 8 April 2015
115
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 3
116
Memory Organization
& Segmentation
117
The processor maps the flat memory space onto the physical address
space using a specific address translation mechanism. However, the
applications programmers do not need to know the details of the
mapping. Relocation of separately compiled modules in this space must
be performed by the operating system software (e.g., linkers, locators,
binders, loaders).
118
Memory
Address Model
Flat Segmented
Address Model Address Model
Processors beginning with the Intel 80286 feature a second mode called
protected mode (or protected virtual address mode PVAM). In this mode
the microprocessor can address up to 24 MB. To ease the transition
to/from protected mode, Intel 80386 and later processors have been
provided with a third mode called "virtual 86", or simply V86 mode. The
x86-64 processors have two modes, namely: the long mode and legacy
mode. Each of these modes has sub-modes, which are backward
compatible with previous 16-bit real mode, 32-bit protected mode as well
as the virtual 86 mode.
119
Operation
Modes
The effective addresses generated by the CPU (EA or Offset) are passed
to the MMU to be checked against the limit in the segment descriptor and
are there added to the segment base address in the descriptor to form a
linear address. On an 80386 and later processors, the linear address is
further processed by the paged MMU before the final result (physical
address) appears on the chip address bus.
120
The 80286 doesn't have a paged MMU so the linear address is output
directly as the physical address.
The VM flag bit, in the EFLAGS register, selects virtual mode operation
in protected mode. Once, this mode is entered, any attempt to access
memory beyond 1 MB, will result in an error. Thus, the purpose of a V86
task is to form a "virtual machine" with which we can execute an 8086
program. A complete virtual machine consists not only of 80386
hardware but also of system software. Thus, the emulation of an 8086 is
the result of cooperation between hardware and software.
121
x86-64 (AMD64)
Operation Modes
Legacy 64-bit
Modes Mode
122
As shown in figure 3-1(d), the 64-bit mode of x86-64 utilizes the flat
memory model. In fact, most of the modern operating systems
neglect the segmentation features available in the legacy x86
architecture. Instead, operating systems handle segmentation
functions entirely in software.
123
Note that, within the 1 MB memory space of 8086/8088 processors, the 20-
bit linear address is equivalent to the 20-bit physical address in memory.
As shown in figure 3-2(c), the offset of the code segment is obtained from
the IP register content. Therefore, when the processor wants to fetch a
new instruction from the code segment, it adds the IP content (offset) to
the CS content (code segment address) to point to that instruction. After
executing the instruction, the microprocessor will need to point to the
next instruction. To obtain the next instruction address, the processor
increment the IP register (by the length of last instruction in bytes) and
adds IP content to CS, as indicated above and so on. Table 3.1 indicates
the segment registers and the location of their corresponding offsets. It
should be noted that the data segment, DS, is usually used to store
program variables.
124
FFFFF
CODE Segment
CS
DS DATA Segment
SS Memory
ES STACK Segment
P
EXTRA Segment
00000
Fig. 3-2(a). Memory segmentation in x86 systems. In real mode, each segment
register points directly to the beginning of corresponding segment in memory.
In real mode, the result offset is a 16-bit value that is sometimes called
effective address (EA)1.
Table 3.1. Segment registers and their typical offsets in x86 microprocessors
1
Intel manuals sometimes call this combination the effective address (EA) and sometimes call it Offset
(when they discuss the assembly language).
125
CS 0000
DS 0000
SS 0000
ES 0000 OFFSET
Physical Address
FFFFF
CODE Segment
Next Instruction
(Program Code)
IP
CS
STACK Segment
SS
00000
126
Offset-Address Generation: Figure 3-3 depicts how the 80386 and later
processors can generate a 32-bit address. As shown, the 32-bit offset, or
effective address (EA), is generally given by the specific summation.
This summation is similar to equation (3-2), except that the base and
index registers are all 32-bit in 80386 and later processors.
127
Fig. 3-3(a). Generation of the 32-bit address offset (or effective address) in 80386
and later processors.
128
Segment Selector
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
14-bit Selector RPL
Index TI
Segment Descriptor
63 … 52 51 … 48 47 … 41 40 39 … 16 15 … 0
Base Address Limit Access 0 Base Address Limit
A24 - A31 L16 - L19 rights A0 - A23 L0 - L15
FFFFFFFF
Memory
Selector Offset
13A7 0010F405
Segment 33EC4050
:
032DD000\
Descriptor Table
Segment Descriptor
Segment Descriptor
……………..
...........................
Segment Descriptor
00000000
129
Figure 3-4(d). Combining the 32-bit effective address (offset) with the segment
selector to obtain the linear address in protected memory
130
The so-called page directory is a special page of memory, which has the
translation table entries. It has up to 1024 page translation table entries,
each 4 bytes long. The paging mechanism makes use of the control
register CR3 of the microprocessor (the page descriptor table register),
for holding the access address of the page directory.
The whole linear address of 4 GB is divided into 1024 pages. Each entry
in the page directory can translate the leftmost 10-bits of a linear address.
31 … ... … … 12 11 10 9 8 7 6 5 4 3 2 1 0
Page Table Address Reserved 0 0 D A 0 0 U/S R/W P
A: Accessed bit (set 1, whenever the microprocessor access this page entry)
D: Dirty bit (used by the operating system)
P: Present bit (set 1, whenever the page entry can be accessed in translation)
R/W: Read/write bit (used in page protection scheme)
U/S: User/supervisor bit (used with R/W bit to develop page priority level)
Figure 3-5(a). One of the page directory records (translation table entry).
A few entries of the translation table are cached in the MMU Translation
Look-aside Buffer (TLB) to avoid excessive memory accesses. The TLB
stores the 32 most frequently used pages table entries and page directory
entries on the processor cache memory (L1-Cache).
The paging mechanism can also admit the use of memory in certain areas,
where no memory exists (e.g., in system BIOS and VIDEO BIOS
ROMs). For instance, the EMM386.EXE program can be used to page the
extended memory area (above the 640kB DOS limit) into 4kB blocks or
pages.
131
Linear Address
31 …… 22 21 …… 12 11 …… 0
Directory Index Table Index Offset
Address
0
0
Page Table Address 0
Page Address
132
OS Kernel
0
Least secure 2
Privilege Ring 3
For switching back to real-address mode, the software should clear the
PE bit in CR0 with a MOV to CR0 instruction. A procedure that attempts
to do this, however, should proceed as follows:
134
With these structures the 80386 and later processors can rapidly switch
execution from one task to another, saving the context of the original task
so that the task can be restarted later.
2
Since AMD64 and Intel 64 are substantially similar, many software and hardware products use one
vendor-neutral term to indicate their support for both implementations. AMD's original designation for
this processor architecture, "x86-64", is still sometimes used for this purpose
135
anywhere in the linear 64-bit address space. The operating system can use
separate selectors for code, stack, and data segments for memory-
protection, but the base address of all these segments is always 0.
• Compatibility mode—This mode uses a protected, multi-segment model
of virtual memory, just as in legacy protected mode. The 32-bit virtual-
memory space is treated as a segmented set of address spaces for code,
stack, and data segments, each with its own base address and protection
parameters. A segmented space is specified by adding a segment selector
to an address.
Although virtual addresses are 64 bits wide in 64-bit mode, current
implementations do not allow the entire virtual address space of 264 bytes
to be used. Most operating systems and applications will not need such a
large address space for the foreseeable future. For example, Windows 64
is only populating 16 TB, or 44 bits long, so supporting such wide virtual
addresses would simply increase the complexity and cost of address
translation with no real benefit. AMD therefore decided that, in the first
implementations of the x86-64 architecture, only the least significant
48 bits of a virtual address would actually be used in address translation.
However, bits 48 through 63 of any virtual address must be copies of bit
47, or the processor will raise an exception.
136
The following table depicts the register usage in legacy and 64-bit
operation modes
137
138
Your computer has a stack, too. The computer’s stack is located at the
very top addresses of memory. Data is added to the stack using the Push
operation, and removed using the Pop operation. Stack may be contained
in a memory segment and identified by the segment selector in the SS
register. A stack can be up to 4 GB long, the maximum size of a segment.
When using the flat memory model, the stack can be located anywhere in
the linear address space which is dedicated for the program. As shown in
the following figure, you can push data onto the top of the stack by the
PUSH instruction, which pushes either a register or memory value onto
the top of stack. Well, we say it’s the top, but the "top" of the stack is
actually the bottom of the stack memory. Although this is confusing, the
reason for it is that when we think of a stack of anything - like papers -
we think of adding and removing to the top of it. However, in memory
the stack starts at the top of memory and grows downward due to
architectural considerations. Therefore, when we refer to the "top of the
stack" remember it’s at the bottom of the stack’s memory. We can
actually continually push data onto the stack and it will keep growing
down in memory until we hit the program code or data. This condition is
called Stack Overflow. You can also pop values off the top using the POP
instruction. This removes the top value from the stack and places it into a
register or memory location of your choice.
139
When an item is pushed onto the stack, the processor decrements the ESP
register, then writes the item at the new top of stack. When an item is
popped off the stack, the processor reads the item from the top of stack,
then increments the ESP register. In this manner, the stack grows down in
memory (to lesser addresses) when items are pushed on the stack and
shrinks up (to greater addresses) when the items are popped from stack.
140
141
Similarly, the POP ESP instruction increments the stack pointer (ESP)
before data at the old top of stack is written into the destination.
A. Push Operation
The PUSH instruction decrements the stack pointer and then stores the
source operand on the top of the stack. The address-size attribute of the
stack segment determines the stack pointer size (16, 32 or 64 bits). The
operand-size attribute of the current code segment determines the amount
the stack pointer is decremented (2, 4 or 8 bytes). In non-64-bit modes: if
the address-size and operand-size attributes are 32, the 32-bit stack
pointer (ESP) is decremented by 4. If both attributes are 16, the 16-bit SP
register (stack pointer) is decremented by 2.
142
IF StackAddrSize = 64
THEN
IF OperandSize = 64
THEN
RSP ← (RSP − 8);
IF (SRC is FS or GS)
THEN
TEMP = ZeroExtend64(SRC);
ELSE IF (SRC is IMMEDIATE)
TEMP = SignExtend64(SRC); FI;
ELSE
TEMP = SRC;
FI
RSP ← TEMP; (* Push quadword *)
ELSE (* OperandSize = 16; 66H used *)
RSP ← (RSP − 2);
RSP ← SRC; (* Push word *)
FI;
ELSE IF StackAddrSize = 32
THEN
IF OperandSize = 32
THEN
ESP ← (ESP − 4);
IF (SRC is FS or GS)
THEN
TEMP = ZeroExtend32(SRC);
ELSE IF (SRC is IMMEDIATE)
TEMP = SignExtend32(SRC); FI;
ELSE
TEMP = SRC;
FI;
SS:ESP ← TEMP; (* Push doubleword *)
ELSE (* OperandSize = 16*)
ESP ← (ESP − 2);
SS:ESP ← SRC; (* Push word *)
FI;
ELSE StackAddrSize = 16
IF OperandSize = 16
THEN
SP ← (SP − 2);
SS:SP ← SRC; (* Push word *)
ELSE (* OperandSize = 32 *)
SP ← (SP − 4);
SS:SP ← SRC; (* Push doubleword *)
FI;
FI;
FI;
143
B. Pop Operation
The POP instruction loads the value from the top of the stack to the
location specified with the destination operand (or explicit opcode) and
then increments the stack pointer. The destination operand can be a
general-purpose register, memory location, or segment register.
IF StackAddrSize = 32
THEN
IF OperandSize = 32
THEN
DEST ← SS:ESP; (* Copy a doubleword *)
ESP ← ESP + 4;
ELSE (* OperandSize = 16*)
DEST ← SS:ESP; (* Copy a word *)
ESP ← ESP + 2;
FI;
ELSE IF StackAddrSize = 64
THEN
IF OperandSize = 64
THEN
DEST ← SS:RSP; (* Copy quadword *)
RSP ← RSP + 8;
ELSE (* OperandSize = 16*)
DEST ← SS:RSP; (* Copy a word *)
RSP ← RSP + 2;
FI;
FI;
ELSE StackAddrSize = 16
THEN
IF OperandSize = 16
THEN
DEST ← SS:SP; (* Copy a word *)
SP ← SP + 2;
ELSE (* OperandSize = 32 *)
DEST ← SS:SP; (* Copy a doubleword *)
SP ← SP + 4;
FI;
FI;
144
As shown in figure, stack operations (like push and pop ) work word wise
(and not byte wise). For instance look at the last example (Push AX).
Here the low byte (AL) is pushed first onto the stack and then the high
byte (AH) at the top of stack. Then the stacked pointer is moved to the
top of stack (decremented by 2 bytes), because stack grows from high to
low address memory.
Figure 3-14. Stack organization with local variables for calling procedures.
Linear Address
(Hexadecimal) Decimal
FFFFFH ROM (BIOS), 8kB 1024kB
ROM (BASIC Compiler), 32 kB .
ROM (user), 8kB .
ROM (expansion), 168 kB .
ROM (Hard Disk Driver BIOS) .
ROM expansion (32kB)
C0000H 768 kB
BFFFFH RAM Video Adaptor (128kB) .
A0000H .
9FFFFH 640 kB.
↑
. | .
|
. RAM (user) .
. | .
. | .
. .
↓
00000H 0 kB
148
149
• Main Memory from 1MB to the top of memory (4GB system memory).
• PCI Memory from the top of memory to 4GB with 2 ranges:
— APIC Configuration Space from FEC0_0000H (4GB–20MB) to
FECF-FFFFH and FEE0_0000H to FEEF_FFFFH.
— High BIOS area from 4GB to 4 GB–2MB.
151
When the processor is in supervisor state, you can use special load and
store instructions to access data values in alternate memory spaces. For
examples, you can load a value from the user data space, or store a value
into the user instruction space. These instructions require an explicit
address space indicator (ASI). Table 3-3 summarizes the ASI values used
for these instructions.
Table 3-3. ASI values for different addressing spaces, in SPARC processors.
152
All the ARM instructions can address any of the 16 visible registers.
The main bank of 16 registers is used by all unprivileged code. These are
the User mode registers. User mode is different from all other modes as it
is unprivileged, which means it can only switch to another processor
mode by generating an exception. The SWI instruction provides this
facility from program control.
Out of the 16 visible registers, the following three registers have special
roles:
Stack pointer Software normally uses R13 as a Stack Pointer (SP). R13
is used by the PUSH and POP instructions in T variants, and by the SRS
and RFE instructions from ARMv6.
Link register Register 14 is the Link Register (LR). This register holds
the address of the next instruction after a Branch and Link (BL or BLX)
instruction, which is used to make a subroutine call. It is also used for
return address on entry to exception modes. At other times, R14 can be
used as a general-purpose register.
154
3-8.2. Stacks
The processor uses a full descending stack. This means the stack pointer
holds the address of the last stacked item in memory. When the processor
pushes a new item onto the stack, it decrements the stack pointer and then
writes the item to the new memory location. The processor implements
two stacks, the main stack and the process stack, with a pointer for each
held in independent registers,
155
3-8. Summary
The physical memory of computer systems is usually organized as a
sequence of 8-bit bytes. Each byte is assigned a unique address that
ranges from zero to a maximum allowed memory space, which depends
on the width of the address bus. For instance, the 8086 has 20-bit address
bus and can address up to 1MB (220 bytes). On the other hand, the 80386
has 32-bit address bus and can address up to 4GB (232 bytes).
The 80x86 processor families can operate in various modes. The Intel
8086, Intel 8088, Intel 80188 and Intel 80186 had only real mode.
156
The 16-bit offset or effective address (EA) may be also considered as the
16-bit instruction pointer (IP) or the 16-bit stack pointer (SP). There are
some special combinations of segment registers and general registers that
point to important addresses:
CS:IP points to the address of the byte of code the processor will fetch
SS:SP points to the location of the last item pushed onto the stack.
With the advent of the 32-bit 80386 processor, the 16-bit general-purpose
registers, base registers, index registers, instruction pointer, and FLAGS
register, but not the segment registers, were expanded to 32 bits. This is
represented by prefixing an "E" (for Extended) to the register opcodes
Thus the expanded AX became EAX, SI became ESI and so on. The
general-purpose registers, base registers, and index registers could all be
used as the base in addressing modes, and all of those registers except for
the stack pointer could be used as the index in addressing modes. Two
new segment registers (FS and GS) were added. With a greater number of
registers, instructions and operands, the machine code format was
expanded. To provide backward compatibility, segments with executable
code can be marked as containing either 16-bit or 32-bit instructions.
Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment
or vice versa.
3-9. Problems
3-2) List and explain the use of the four segment registers in the 8086
microprocessors. Also describe how the 20-bit physical address is
obtained from the 16-bit segment and offset addresses .
3-8) Describe how the translation Look-aside Buffer (TLB) can help and
save time when a linear-to-physical memory translation is needed?
3-11) The unit which acts as an intermediate agent between memory and
backing store to reduce process time is _____ .
a) TLB’s
b) Registers
c) Page tables
d) Cache
3-16) Write-back:
(a) reverses the order of the bits of data.
(b) is used to double-check the accuracy of data before use.
(c) is only used in the little-endian system.
(d) stores results in the cache rather than in the external memory.
161
3-10. Bibliography
162
Microprocessor
Instructions
Contents
4-1. Introduction
4-2. Data Types (Bytes, Words, Integers, Floating point numbers, … )
4-3. Instruction Format of x86 Microprocessors
4-4. Addressing Modes. of x86 Microprocessors
4-5. Intel‘ 8086/80186/80286/80386/80486 Instruction Set (Alphabetical)
4-6. Basic Instruction Set of x86 Microprocessors (by category)
4-6.1. Data Transfer Instructions
4-6.2. Arithmetic Instructions
4-6.3. Logic Instructions
4-6.4. String Instructions
4-6.5. Program Control Instructions
4-6.6. Processor Control Instructions
4-7. Math Coprocessor (x87) Instructions
4-8. Subroutine Calls & Interrupts in x86 Microprocessors
4-8.1. Subroutine Calls (CALL)
4-8.2. Interrupts (INT)
4-8.3. Masking Interrupts (Turning Interrupts Off)
4-8.4. Interrupts Priority
4-9. IBM PC Interrupts and & DOS Calls
4-9.1. PC Boot Process
4-9.2. PC Interrupt Service Routines
4-9.3. BIOS Calls & DOS Calls
163
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
164
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Microprocessor
Instructions
4-1. Introduction
It is well known that digital computers can only understand and execute
machine (binary) codes. However, humans almost never write programs
directly in machine code. Instead, they use higher-level programming
languages which can be translated by special computer programs
(compilers) into machine code.
165
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The byte is eight contiguous bits starting at any logical address. The bits
are numbered 0 through 7; bit zero is the least significant bit (LSB).
Higher FFFFF
Address : MSB
: XXXX1 15 Higher Byte 8
: XXXX0 7 Lower Byte 0
Lower : LSB
Address 00000
Fig. 4-1(a). Byte and word organization in memory, according to the little endian
representation (lower byte has lower address)
166
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
vii- Bit string: A contiguous sequence of bits. A bit string may begin at
any bit position of any byte and may contain up to 2 32 -1 bits.
viii- BCD: A byte (unpacked) representation of a decimal digit in the
range 0 through 9. Unpacked decimal numbers are stored as unsigned
byte quantities. One digit is stored in each byte. The magnitude of a
number is determined by the low-order half-byte; hexadecimal values 0-9
are interpreted as decimal numbers.
167
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The digit in the high-order half-byte is the most significant. Values 0-9
are valid in each half-byte. The range of a packed decimal byte is 0-99, as
shown in figure 4-1(b).
Unpacked BCD
7 6 5 4 3 2 1 0
0-9
Packed BCD
7 6 5 4 3 2 1 0
0-9 0-9
Fig. 4-1(b). Packed and unpacked binary-coded decimal (BCD) number
representation.
N = + b-1 x 2-1 + b-2 x 2-2 + b-3 x 2-3 + …+ b-L x 2-L for N ≥ 0 (4-1b)
= - b-1 x 2-1 + b-2 x 2-2 + b-3 x 2-3 + …+ b-L x 2-L for N < 0
168
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Fixed-point Number
Sign bit L bits
0 or 1 b-1 b-2 b-3 ... ….. ……. …… … ... b-L
Decimal point
Fig. 4-1(c). Fixed-point number representation (signed magnitude form).
Floating-point Number
Signed Exponent E Signed Fraction (Mantissa) F
Sign bit Exponent Sign bit Mantissa
0 or 1 0 or 1 b-1 b-2 …. b-L
Excess-127 Exponent (Base 2): The IEEE 754 standard specifies that the
"exponent" will be encoded as a 8-bit value, using the unsigned binary
code of a value which is 127 more than (in "excess" of) the actual base 2
exponent required to represent the desired value.
169
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Example 4-1.
Let's write the 32-bit number 434D4000H in IEEE 754 format. This
number may be re-written in binary form as follows:
4 3 4 D 4 0 0 0
0100 0011 0100 1101 0100 0000 0000 0000
64-bit Format (Long Real or Double Precision): The 64-bit format has
the same structure as the 32-bit format except that it uses 11-bit exponent
encoded in excess-1023 notation and 52-bit mantissa. This provides 15
decimal digits of precision and a scale of 300 decimal digits.
170
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
80-bit Format (Extended Precision): The 80-bit format has also the same
structure as the 32-bit format except that it uses 15-bit exponent and 64-
bit mantissa.
171
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
172
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
173
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Example 4-3:
Draw a schematic representation depicting the execution of the indirect
addressing instruction: MOV AX,[BX+3]. Assume that BX contains 8000
and the contents of the data segment offset locations 8003, 8004 are 99H,
77H, respectively. Consider, the data segment address is (DS) = A000.
Solution:
As shown in figure 4-4, this instruction causes the following actions:
AX 7799 A8004 77
A8003 99
BX 8000
DS A000 (3)
EA = [BX+ 3] = 8003
Fig. 4-4. Sequence of operations for executing the instruction: MOV AX,[BX+3]
174
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
175
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
176
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
177
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
178
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
179
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
For instance, there are only three instructions to set three flags, namely:
STC (set CF), STI (set IF) and STD (set DF).
In order to set other flags, you have to get around by pushing all flags into
the stack (PUSHF) and changing the flag bits you would like to modify
and then popping the stack back into the FLAG register (POPF). The
following piece of code sets the TF.
PUSHF
MOV BP,SP
OR WORD PTR[BP+0],0100H ; This is a mask to set the TF
MOV SP,BP
POPF
Now on, we may use the notation (E)REG, where REG is any 16-bit
register, to indicate that instruction may be used with either the 16-bit
register (e.g., SP) or the corresponding 32-bit register (ESP). Also, we
may use the notation (R)REG with either a 16-bit register (e.g., SP) or the
corresponding 32-bit register (ESP) or the corresponding 64-bit register
(RSP). However, we sometimes drop the initial letter (E) or (R), for the
matter of simplicity.
180
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
181
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The conditional move instruction (CMOV) was borrowed from the world
of single instruction computers (SIC), which are some sort of optimized
RISC machines, to enrich the instruction set of Pentium processors.
CMOV has also the general syntax CMOVx dest,src, which corresponds
to the following statement:
where x is the condition code. For instance, CMOVs AX,BX means do the
transfer from BX to AX if the sign flag is set (SF =1).
IN and OUT are special data transfer instructions. IN reads data in from
input port and OUT writes data out to an output port. IN and OUT have
various formats. For instance, IN AL,1FH will input data to AL, from the
input port address 1FH. Also, OUT 0F,AL will output data from AL, to
the output port address 0FH. The DX register is sometimes used to hold
the port address. For instance, IN AL,DX will input data to AL, from the
input port whose address is stored in DX. Such instructions will be
discussed in details in chapter 8.
182
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The translate byte instructions (XLAT / XLATB) are frequently used for
translating values from encoding format to another. XLATB replaces the
byte in AL with byte from a user table addressed by BX. The original
value of AL is the index into the translate table. For instance, if we want
to transform a 4-bit binary number to an ASCII encoded Hexadecimal
digit, we proceed as follows (assuming the 4-bit digits in AL):
Example 4-4:
Show how to use the XLAT instruction to perform the translation from
binary-coded decimal (BCD) to 7-segment code, as shown in figure 4-5.
Solution:
Consider the following BCD to 7-segment translation program, where the
unpackaged BCD string begins at the address BCD_STR and the resultant
7-segment code is to be stored at 7SEG_STR.
G F E D C B A
0 0 1 1 1 1 1 1 3F
1 0 0 0 0 1 1 0 06
2 1 0 1 1 0 1 1 5B
3 1 0 0 1 1 1 1 4F
4 1 1 0 0 1 1 0 66
5 1 1 0 1 1 0 1 6D
6 1 1 1 1 1 0 1 7D
7 0 0 0 0 1 1 1 07
8 1 1 1 1 1 1 1 7F
9 1 1 0 1 1 1 1 6F
Fig. 4-5. BCD to 7-segment code translation.
7SEG ENDP
PUSH (push) decrements the stack pointer SP, then transfers the source
operand to the top of stack indicated by SP, as shown in figure 3-8. PUSH
is often used to place parameters on the stack before calling a procedure;
it is also the basic means of 184 storing temporary variables on
PUSHA (Push All Registers) saves the contents of the eight general
registers on the stack. This instruction simplifies procedure calls by
reducing the number of instructions required to retain the contents of the
general registers for use in a procedure. The processor pushes the general
registers on the stack in the following order: AX, CX, DX, BX, the initial
value of SP before AX was pushed, BP, SI, and DI. PUSHA is
complemented by the POPA instruction.
POP (Pop) transfers the word or double word at the current top of stack,
indicated by SP, to the destination operand, and then increments SP to
point to the new top of stack, as shown in figure 3-8. POP moves
information from the stack to a general register, or to memory. There is
also a variant of POP that operates on segment registers.
POPA (Pop All Registers) restores the registers saved on the stack by
PUSHA, except that it ignores the saved value of SP. There exist other
instructions which deal with stack. They are summarized in table 4-3.
value of the sign bit of the smaller item. This kind of conversion is called
sign extension.
1. The forms CWD, CDQ, CBW, and CWDE which operate only on
data in the EAX register.
2. The forms MOVSX and MOVZX, which permit one operand to be
in any general register while permitting the other operand to be in
memory or in a register.
CWD (Convert Word to Double word) and CDQ (Convert Double word
to Quad-Word) double the size of the source operand. CWD extends the
sign of the word in register AX throughout register DX. CDQ extends the
sign of the double word in EAX throughout EDX. CWD can be used to
produce a double word dividend from a word before a word division.
CBW (Convert Byte to Word) extends the sign of the byte in register AL
throughout AX.
CWDE (Convert Word to Double word Extended) extends the sign of the
word in register AX throughout EAX.
1 x x x x x x x
MOVSX
1 1 1 1 1 1 1 1 1 x x x x x x x
x x x x x x x x
MOVZX
0 0 0 0 0 0 0 0 x x x x x x x x
186
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Fig. 4-6. Sign extension and zero extension, from 1 byte to 2 bytes.
187
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
ADD operand1,operand2
When the result exceeds the length of the destination, the carry flag is set
and the destination is extended. For instance, the word instruction MUL
1234H will multiply AX by the 16-bit value 1234H and put the 32-bit
result in the register pair DX:AX. Also, the 32-bit multiplication MUL
12345678H will multiply EAX by the 32-bit value 12345678H and stored
the result in EDX:EAX. Figure 4-7(b) illustrates these operations
Note that MUL instruction multiplies two unsigned (positive) integers
while the IMUL instruction multiplies two signed integers (either
positive or negative). Similarly, the DIV instruction divides two unsigned
integers, while the IDIV instruction divides two signed integers.
MUL BX
DX AX BX
188
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
0 0 0 0 7 C 4 8 0 1 0 0 Before
0 0 7 C 4 B 0 0 0 1 0 0 After
MUL op DIV op
Multiplier Divider Result
Multiplicand Product Dividend
(op) (op) Remainder Quotient
Byte AL AH AL Byte AL AH AL
Word AX DX AX Word AX DX AX
Dword EAX EDX EAX Dword EAX EDX EAX
Fig. 4-7(b). Illustration of MUL and DIV instructions, with different operand sizes.
The ADC (add with carry) instruction also sums two binary operands
placing the result in the destination. If CF is set, a 1 is added to the
destination. The SBB (subtract with borrow) instruction subtracts the
source from the destination, and subtracts 1 extra if the Carry Flag is set.
Results are returned in destination.
The AAA (ASCII adjust for addition) and AAS (ASCII adjust for
subtraction) instructions permit to do simple arithmetic operations
directly on ASCII numbers. Also, AAM (ASCII adjust for multiplication)
is used after multiplication of two unpacked decimal numbers. The high
order nibble of each byte must be zeroed before using AAM instruction.
Similarly, the DAA (decimal adjust for addition) and DAS (decimal
adjust for subtraction) instructions permit to do simple arithmetic
operations directly on BCD numbers. In fact, the BCD numbers (0-9) can
be stored instead of usual binary numbers, such that each one BCD digit
occupies 4 bits. So, if we added 2 bytes which are containing BCD
numbers (each byte contains 2 decimal digits), the result will not be
necessary correct, because the ADD command assume binary numbers.
The DAA instruction will adjust the result such that when we translate it
as BCD, we find it correct.
189
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Example 4-5:
Suppose we‘ve the packed decimal number 72 in AL and we want to add
19 to it. We know that the BCD result is 91. However, the ADD AL,19H
command will result in AL= 8BH, which is not correct. So, the DAA will
adjust the content of Al to the correct answer 91.
The main logic operations are usually done between accumulator and
operand content, and the result is saved in accumulator. Figure (4-8)
depicts the main shift and rotate operations, and their effect on different
flags, in x86 microprocessors.
The AND, OR an XOR instructions operate on 8, 16 and 32 bit operands.
One of the interesting uses of AND instruction is to mask (set to zero)
selected bits in some value. For instance to mask all bits of AL except for
the first bit, we use the instruction:
Note that AND will change the destination content (AL) and will affect
the flags according to the result. The TEST instruction can be used for
masking without changing the content of the destination. For instance,
TEST is used to check if a certain bit is zero or one:
If the first bit of AL is zero, the result is zero and zero flag (ZF) is set.
Also, we can use TEST to check whether the content of DL is positive or
negative using the following instruction:
190
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
If the content of DL is positive (last bit is zero) then the zero will be set.
The following program input data from input port (PORT1) and check if
the first bit changes its value from high (1) to low (0):
Similarly, the XOR instruction can be used for toggling (ones to zeros
and zeros to ones) of a certain value. For instance, in order to toggle all
bits in MEMOVAL we use the instruction:
191
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
As shown in figure 4-8, in the SHR operation, all bits are shifted right 1
bit, and the most left significant bit is filled with zero. The SHR is
equivalent to unsigned division by 2.
Similarly, in the SHL operation, all bits are shifted left 1 bit, and the most
right significant bit is filled with zero.
In SAR, the most left significant bit is filled with the last bit (before
shifting). So, the shifted register will keep its sign after shift operation
(and divided by 2).
Shift operations
CF CF
0 0
SHR SHL
CF CF
SAR SAL 0
Rotate operations
CF CF
ROR ROL
192
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
CF CF
RCR RCL
193
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
MOVS Memory
P
6AB3
ES:DI
ES
DI 74FF0
6A B3
ES 47 FF
SI 2A 03
2A03
DS 03 80 DS:SI DS
03800
Note that the direction flag (DF), which is bit 10 of the FLAGS register,
controls string instructions. Setting DF (to 1) causes string instructions to
auto-decrement; that is, to process strings from high to low addresses.
Clearing DF (to 0) causes string instructions to auto-increment, or to
process strings from low to high addresses. String instructions may be
preceded by the instruction prefix REP (repeat) or REPE (repeat if
equal) or REPNE (repeat if not equal), that repeat the instruction
operation, by the number of times specified by CX.
The first string instruction in the above table is CMPS (compare strings)
instruction. CMPS is used for 194 comparing two strings in
Similarly CMPSD compares two dword strings, and EDI and ESI are
decremented (or incremented) by 4, each time the instruction is invoked.
If the CMPSD command is repeated (by REP) the comparison is done
with the next dword of the string.
MOVS (and its variants MOVSB, MOVSW, MOVSD) copy data from
source string addressed by DS:SI to the destination location ES:DI
destination, based on the size of the operand (byte, word or double word)
or the used instruction. MOVS also updates SI and DI. In byte string
transfers (MOVSB), SI and DI are incremented (+1) when the DF is
cleared and decremented (-1) when the DF is set.
Note that the prefixes REPE and REPNE are similar to REP in that they
cause the specified instruction to repeat for the number of times specified
by CX (until CX=0). Furthermore, these two prefixes stop the execution
of the repeating instruction when the zero flag (ZF) is equal/not equal 1.
196
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The jump instructions have one operand, which specifies the jump target.
Note that conditional jumps may be signed or unsigned. In signed
conditional jump, the sign flag (SF) is taken into account. For instance,
when executing the instructions JGE/JNL, which means jump if greater
or equal / jump if not less, the microprocessor checks if ( SF XOR OF)=0.
197
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The LOCK instruction prefix and 198 its corresponding output signal
should only be used to prevent other bus masters from interrupting a data
movement operation. LOCK may only be used with the following 80386
instructions when they modify memory. An undefined-opcode exception
results from using LOCK before any other instruction.
The Intel mnemonics for the 80x87 begin with the letter 'F' (no normal
8086 mnemonics begin with 'F'). For example, the mnemonic ADD
specifies a 8086 integer addition, while the mnemonic FADD selects a
8087 floating-point addition.
The 8087 has 68 basic instructions, which may be divided into the
following 6 groups:
200
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
201
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
202
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
So, subroutines can be nested to any required depth; the only limitation is
the stack size. As we'll see, later in chapter 5, subroutines are usually
called procedures in assembly programming environments.
203
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Interrupts and exceptions are alike in that both cause the processor to
temporarily suspend its present program execution in order to execute a
program of higher priority. The major distinction between these two
kinds of interrupts is their origin. An exception is always reproducible by
re-executing with the program and data that caused the exception,
whereas an interrupt is generally independent of the executing program.
We have seen so far, in chapter 2, the 80x86 interrupts fall into one of the
following two categories:
Hardware Interrupt
* NMI : non-maskable Interrupt
* INTR : maskable Interrupt
Software Interrupt (handled by INT instruction)
* INT 0 ~ INT 255
* INT 0 : divide error
* INTO = INT 4 : interrupt on overflow
outside the bounds of the array. Invalid opcodes may be used by some
applications to extend the instruction set. In such a case, the invalid
opcode exception presents an opportunity to emulate the opcode.
The "coprocessor not available" exception occurs in older x86
processors if the program contains instructions for a coprocessor, but
no coprocessor is present in the system.
A coprocessor error is generated when a coprocessor detects an illegal
operation.
Table 4-11. First 10 exceptions and interrupts in x86 systems
INT nn
205
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
At first, the flags register, the CS register and IP are all pushed onto
stack. The interrupt service routine address is fetched from the interrupt
vector table, which starts at the absolute memory address 0:0 to 0:3FFH.
The CS:IP are loaded with the interrupt vector located at the absolute
memory address 0:4*nn and 0:4*nn+2, respectively.
Then the program jumps to the new CS:IP address, which is the location
of interrupt service routine, and starts its execution. When the
microprocessor encounters IRET instruction (in the end of interrupt
service routine), it pops the original CS:IP as well as flags from the stack,
resuming to the main program and continues its execution.
Figure 4-11 depicts the structure of the interrupt vector table (IVT),
which is located at the lowest 400H bytes (1kB) of memory. It‘s some
sort of a big jump table, which contains the addresses of interrupt service
routines.
IP255
Interrupt Vector INT 255 03FE
CS255 M
03FC
IP254
Interrupt Vector INT 254 03FA E
CS254
03F8
M
IPnn
Interrupt Vector for INT nn 4*nn +2
CSnn O
4*nn +0
IP00 R
Interrupt Vector for INT 0 0002
CS00 Y
0000
It should be noted that the software interrupts (INT 00 through INT 04)
have predefined tasks and cannot be used for any other purpose. For
instance, the divide by zero interrupt (INT 00) is sometimes referred to as
a processor exception that the CPU is unable to handle, since the division
by zero produces undefined answer. So, INT 00 is invoked by the
microprocessor when an attempt is made to divide a number by zero. In
the IBM PC, the interrupt service routine of this interrupt displays the
message ―DIVIDE BY ZERO ERROR‖ on the screen of the
microcomputer. The following code, for instance, will invoke INT 00:
206
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
MOV AL, 20 ; AL = 20
SUB CL,CL ; CL = 0
DIV CL ; AL/CL = 20/0 AL
Example 4-7: Use the dump instruction of the DEBUG program to know
the memory address of the interrupt service routines (ISR) of INT 00
through INT 03
Solution: We introduce the DEBUG program in the next chapter and
show how to handle assembly programs using it. However, the DEBUG
program has so many commands. It is so easy to display the memory
content of a range of bytes using the dump command ―-D‖ followed by
the range of address to be displayed. The interrupt vectors of the first 4
interrupts can be found in the following address range: 0000:000 through
0000:000F
C:\> DEBUG
-D 0000:0000 - 000F
0000:0000 E8 56 2B 02 56 07 70 00 – C3 E2 00 F0 56 07 70 00
Note that the low address has the low value, because of the little endian
conventions (low byte = low address) used in DEBUG program.
207
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Assembly language instruction code CLI clears the interrupt flag IF-
and disables all maskable interrupts
Assembly language instruction code SLI sets the interrupt flag IF-
and enables the maskable interrupts.
Note that an interrupt be interrupted. But any two interrupts cannot occur
simultaneously- They never do because they are given different priorities!
Notice that IRQ7 is used by the printer drivers (in MSDOS), so has a very
high priority, and that this accounts for why the MSDOS computers were
tied up and unable to do anything during a print job.
Example 4-8.
Show how the IBM PC uses Date and Time interrupt to keep its internal
clock.
Solution:
The computer keeps track of the date and the time to within 1/100 th
seconds, and time is stored as follows:
208
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Once loaded in memory, the boot loader program is given control of the
CPU and a series of instructions are executed that will look in the
directory of the disk for the system files, DOS.SYS and BIO.SYS 2. If
these two system files are on the disk, they are loaded into low memory
in that order, along with any driver programs that are listed in the ―device
= statement‖ of config.sys file. Control of the CPU is then given to the
DOS program to finish the boot up process, by loading the command
processor program COMMAND.COM into memory in the next available
space right after DOS.SYS. If the system files do not exist or they are
corrupt in any way, the boot loader program will display the familiar
message " DISK BOOT FAILURE ".
210
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Chapter 9. The next FF bytes starting at 00400 are used by the operating
system to store a complete equipment list in HEX form. Then starting at
00500 is the IBMBIO.SYS program that contains all of the sub-routines
used by the operating system to interface with the hardware. The BIOS
contains very low level instructions that communicate with such devices
as the keyboard, printer, video, disk drives and the chips on the mother
board. The next program loaded-in is called IBMDOS, which contains
subroutines that interface with the Disk Operating System (DOS).
256kB
C8000H
128B
BFFFFH VRAM (Video Adaptor RAM swaps here)
A0000H
9FFFFH
640kB
DOS (COMMAND.COM program)
DOS (IBMDOS.SYS / MSDOS.SYS)
00500H DOS (IBMBIOS.SYS / IO.SYS)
004FFH DOS (Equipment List)
00400H
003FFH Interrupt Vector Table (IVT)
00000H
All internal commands are located inside command.com itself and all
external commands are located on the disk. After the command is
interpreted, command.com will pass all the parameters to the IBMDOS
program. IBMDOS will process the parameters into the proper format for
the IBMBIO program, which will actually turn-on the drive or control the
hardware needed for the command entered. MS-DOS is a three level
operating system, such that it is made up of three programs that are at
three levels of programming. The COMMAND.COM program is the
highest level because it understands commands like DIR, COPY etc. The
IBMDOS.SYS is the second highest level because it receives instructions
from command.com and passes them on to IBMBIO.SYS, which is the
lowest level of programming.
All of the subroutines, that are part of these three programs, perform
specific functions in the computer. For example, there are sub-routines
that are written just to control the video monitor and how it displays data
on the screen. The subroutines used in the operating system require many
8088 instructions to perform a specific task and are very complex in the
program style. Fortunately, IBM and Microsoft designed a method that
allows the computer programmer to utilize all of these tested and proven
subroutines in their own programs. This makes programming easier
because programmers will spend less time writing code that has already
been written by knowledgeable programmers.
212
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Table 4-12. Summary of main Interrupt vectors, in IBM PC and compatible computers
Not that the interrupt types 20H-3FH are serviced by DOS routines. DOS
interrupts are often referred to as DOS CALLS and they are all INT
instructions. The other interrupt type is often called BIOS interrupt because
it calls sub-routines inside IBMBIO.SYS or the ROM BIOS chip.
213
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The BIOS interrupts are much faster than DOS interrupts because the
BIOS is a low-level program, just in contact with the hardware layer, as
shown in figure 4-12.
Application
DOS
BIOS
HW
Example 4-9:
Show how to use INT 21 to get the system time (DOS function 44)?
Solution:
The INT 21 is concerned with DOS function calls. To get the system time
we use the DOS function number 44. This number should be put into AH,
before we call the interrupt 21. After calling INT 21, the result (system
time) can be then found in CX and DX, as follows:
A more detailed program, that gets the time of the day, converting it to
ASCII characters ‗hh:mm:ss‘ and displays it on the screen, is presented in
Problem (5-13), at the end of the next chapter.
Example 4-10:
Show how to put the cursor in the middle of the screen and print the
letters ―Hi‖ using the BIOS video calls (INT 10).
Solution:
Use INT 10, Function 2 (set cursor position) and Function 9 (display a
character) as shown in the following program:
215
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
As an exercise, the reader may make use of the last two examples, to get
the system time and display it in the middle of the screen. You may use
the DEBUG program for writing and debugging your assembly program,
for that purpose.
We have also stated earlier, that all interrupts and exceptions share a
common feature; the current execution location (CS:IP) and flags are
saved onto the stack, and the control is transferred to the interrupt service
routine (ISR). We also mentioned that x86 machines supports 256
interrupt, invoked by the instruction INT nn, where nn from 0 to 255.
In real mode, the interrupt number (nn) is used to point at a location in the
interrupt vector table (IVT), where the ISR address is there. The
difference between real mode and protected mode interrupts is that the
IVT is replaced with an interrupt descriptor table (IDT), in protected
mode. The IDT still contains up to 256 interrupt levels entries but each
level is accessed via an interrupt gate instead of the interrupt address.
Thus the first 1kB of memory no longer contains interrupt vectors.
Instead, the IDT may be located anywhere in the memory map of the x86
system. In protected mode, the ISR is reached via a gate in the IDT. In the
following sections we show what a gate is and what the IDT is, in details.
4-10.1. Gates
A Gate is a system object that points to a procedure in the code segment.
Each gate has a descriptor and a privilege level. There exist 4 types of
gates:
216
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
217
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Debug Registers
218
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
example, there are instructions that move data between the general-
purpose registers and the XMM or MMX registers, and many of the
integer vector (packed) instructions can operate on either XMM or MMX
registers, although not simultaneously.
• The data are often represented as small quantities, such as 8 bits for
pixel values, 16 bits for audio samples, and 32 bits for object coordinates
in floating-point format. The 128-bit and 64-bit media instructions are
designed to accelerate these applications. The instructions use a form of
vector (or packed) parallel processing known as single-instruction,
multiple data (SIMD) processing. The vector technology has the
following characteristics: A single register can hold multiple independent
pieces of data. For example, a single 128-bit XMM register can hold 16
8-bit integer data elements, or four 32-bit single-precision floating-point
data elements.
220
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
221
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
The following 3Dnow! Instructions were added with K6-2 from AMD.
4
IA-32 architecture is the instruction set architecture and programming environment for Intel's 32-bit
microprocessors. The x86-64 architecture or AMD64 is the instruction set architecture and
programming environment which is the superset of Intel‘s 32-bit architectures. It is compatible with the
IA-32 architecture, Intel 64 architecture as well as AMD64.
222
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
SSE and SSE-2 also include floating point modes in which only the very
first value of the registers is actually modified. Some other unusual
instructions have been added including a sum of absolute differences
(used for motion estimation in video compression, such as is done in
MPEG) and a 16-bit multiply accumulation instruction (useful for digital
filtering). SSE3 and 3DNow! extensions, include addition and subtraction
instructions for treating paired floating point values like complex
numbers. These instruction sets also include numerous fixed sub-word
instructions for shuffling, inserting and extracting the values around
within the registers.
C. SSSE3 Instructions
These instructions are added with Xeon 5100 series and initial Core 2
224
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
225
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Register used in EA
calculation
Register extension
(Word / Byte) ( 1 / 0 )
Instruction Opcode
Fig. 4-15(b). Instructions format of x86 processors (32-bit instructions). The register field
code (REG), addressing mode code (MOD) and R/M code are shown in tables 4-13,14,15.
Now let us discuss the details of encoding bits in x86 instructions. The
opcode of the instruction is the first byte of the instruction. However,
some opcodes may occupy more than 1 byte. Appendix B depicts the
opcode map of x86 instructions. Within most of opcodes there are special
1-bit indicators; namely:
The R/M field, in conjunction with the MOD field, chooses the
addressing mode. The mod field encoding is described in table 4-15.
Also, table 4-16 depicts the R/M encoding bits.
227
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Table 4-15. MOD field bits, in the addressing mode byte of x86 instructions
The MOD field chooses the memory mode. It also chooses the size of the
displacement (zero, one, two, or four bytes) that follows the instruction
for memory addressing modes. If MOD=00, then you have one of the
addressing modes without a displacement (register indirect or base/
indexed). If MOD does not equal 11, the R/M field encodes the memory
addressing mode as follows:
Table 4-16. R/M field bits in the addressing mode byte of 80x86 instructions
:
R/M Field Encoding
R/M Addressing mode (Assuming MOD=00, 01, or 10)
mmm
000 [BX+SI] or DISP[BX][SI] (depends on MOD)
001 [BX+DI] or DISP[BX+DI] (depends on MOD)
010 [BP+SI] or DISP[BP+SI] (depends on MOD)
011 [BP+DI] or DISP[BP+DI] (depends on MOD)
100 [SI] or DISP[SI] (depends on MOD)
101 [DI] or DISP[DI] (depends on MOD)
110 Displacement-only or DISP[BP] (depends on MOD)
111 [BX] or DISP[BX] (depends on MOD)
Example 4-11:
Given that the opcode of MOV instruction is 100010, show how to
encode the following instruction: MOV BP,SP
228
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Solution:
Opcode = MOV (100010)
D = transfer to register (1)
W = word (1)
REG = BP (101)
MOD = register mode (11)
R/M = SP (100)
Example 4-12:
The following assembly program calculates the sum of the ten integers (1
through 10) into the microprocessor accumulator. Encode the program
into equivalent 8088 machine code.
Solution:
When the instructions are decoded using the 8088 opcodes, we have:
229
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
As we mentioned so far, MOV makes a copy of Source and stores its value
into Destination. It overwrites the Source value in Destination and does
not affect the original contents of Source. However, encoding of the MOV
instruction is probably the most complex in the instruction set.
Nonetheless, without studying the machine code for this instruction you
will not be able to appreciate it, nor will you have a good understanding
of how to write optimal code using this instruction.
There are several versions of the MOV instruction. The mnemonic MOV
describes over a dozen different instructions on 80x86 processors. The
most commonly used form of the MOV instruction has the following binary
encoding scheme, shown in figure 4-16.
The opcode of MOV is the first eight bits of the instruction. Bits zero and
one define the width W of the instruction (Byte, Word, or Double Word)
and the direction D of the transfer. Sometimes, the values of D and W
will be filled for you, as a part of the opcode.
230
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Note that at least one of the operands is always a general purpose register.
If present, the REG field in the addressing mode byte specifies that
register. As we pointed out so far, the bits in the REG field (rrr) let you
select one of eight different registers. Table 4-12 depicts the REG field
bits. The R/M field (mmm), with the MOD field (oo), choose the
addressing mode. The MOD field encoding is shown in table 4-14.
The MOD field chooses the register-to-register or register-to- or –from-
memory move. It also chooses the size of the displacement (zero, one,
two, or four bytes) that follows the instruction for memory addressing
modes. If MOD = 00, then you have one of the addressing modes without
a displacement (register indirect or base/indexed). Note the special case
where MOD = 00 and R/M = 110, as indicated in rows 5 and 6. This
would normally correspond to the [BP] addressing mode. The 8086 uses
this encoding for the displacement-only addressing mode.
This means that there is no true [BP] addressing mode on the 8086. In
order to understand why you can use the [BP] addressing mode in your
programs, look at MOD = 01 and MOD = 10 in the above table. These bit
patterns activate the disp[reg] and the disp[reg][reg] addressing modes.
This is not the same as the [BP] addressing mode. However, consider the
following instructions:
These statements, using the indexed addressing modes, perform the same
operations as their register indirect counterparts (obtained by removing
the displacement from the above instructions). The only real difference
between the two forms is that the indexed addressing mode is one byte
longer (if MOD = 01, two bytes longer if MOD = 10) to hold the
displacement of zero.
231
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Because they are longer, these instructions may also run a little slower.
This trait of the 80x86 - providing two or more ways to accomplish the
same thing - appears throughout the instruction set. In fact, you've to see
more examples before you're through with the MOV instruction. Table 4-14
depicts the R/M field encodes, when MOD does not equal 11.
Don't forget that addressing modes involving BP use the stack segment
(SS) by default. All others use the data segment ( DS) by default. If this
discussion has got you totally lost, you haven't even seen the worst of it
yet. Keep in mind; these are just some of the 8086 addressing modes.
You've still got all the 80386 addressing modes to look at. You're
probably beginning to understand what they mean when they say
complex instruction set computer. Full description of the x86 opcode
map, can be found in appendix C.
There are several important facts you should always remember about the
MOV instruction. First of all, there is no memory to memory move. For
some reason, newcomers to assembly language have a hard time grasping
this point. While there are a couple of instructions that perform memory
to memory moves, loading a register and then storing that register is
almost always more efficient.
The number of clocks (N) is the sum of the basic clocks (No) plus the
total time required to calculate the effective address (EA) if memory
operand is involved.
Example 4-13.
Refer to Appendix B and calculate the ADD instruction execution time
for different variations. Which ADD instruction is the fastest one?
232
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
233
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
In much the same way as x86 instructions, the SPARC basic instructions
can be divided into the following categories.
Memory access
Integer operate
Control transfer
State register access
Floating-point operate
Conditional move
Register window management
TPC, TNPC, and TSTATE are entries in a hardware trap stack, where the
number of entries in the trap stack is equal to the number of trap levels
supported (impl. dep. #101).
235
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
A trap also sets bits in the PSTATE register, one of which can enable an
alternate set of global registers for use by the trap handler. Normally, the
CWP is not changed by a trap; on a window spill or fill trap, however, the
CWP is changed to point to the register window to be saved or restored.
237
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
238
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
a: The a bit annuls the execution of the following instruction if the branch
is conditional and untaken, or if it is unconditional and taken.
c0, cc1, and cc2: specify the condition codes (icc, xcc, fcc0, fcc1, fcc2,
fcc3) to be used in the instruction.
Individual bits of the same logical field are present in several other
instructions: Branch on Floating-Point Condition Codes with Prediction
Instructions (FBPfcc), Branch on Integer Condition Codes with
Prediction (BPcc), Floating-Point Compare Instructions, Move Integer
Register if Condition is Satisfied (MOVcc), Move Floating-Point
Register if Condition is Satisfied (FMOVcc), and Trap on Integer
Condition Codes (Tcc). In instructions such as Tcc that do not contain the
cc2 bit, the missing cc2 bit takes on a default value. See table 38 on page
279 for a description of these fields‘ values.
cond: This 4-bit field selects the condition tested by a branch instruction.
d16hi and d16lo: These 2-bit and 14-bit fields together comprise a word-
aligned, sign-extended, PCrelative displacement for a branch-on-register-
contents with prediction (BPr) instruction.
disp22 and disp30: These 22-bit and 30-bit fields are word-aligned, sign-
extended, PC-relative displacements for a branch or call, respectively.
fcn: This 5-bit field provides additional opcode bits to encode the DONE
and RETRY instructions.
i: The i bit selects the second operand for integer arithmetic and
load/store instructions. If i = 0, the operand is r[rs2]. If i = 1, the operand
is simm10, simm11, or simm13, depending on the instruction, sign-
extended to 64 bits.
239
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
imm22: This 22-bit field is a constant that SETHI places in bits 31..10 of
a destination register.
imm_asi: This 8-bit field is the address space identifier in instructions
that access alternate space.
impl-dep: The meaning of these fields is completely implementation-
dependent for MPDEP1 and IMPDEP2 instructions.
mmask: This 4-bit field imposes order constraints on memory references
appearing before and after a MEMBAR instruction.
op and op2: These 2- and 3-bit fields encode the three major formats and
the Format 2 instructions.
op3: This 6-bit field (together with one bit from op) encodes the Format 3
instructions.
opf: This 9-bit field encodes the operation for a floating-point operate
(FPop) instruction.
opf_cc: Specifies the condition codes to be used in FMOVcc instructions.
See cc0, cc1, and cc2 above for details.
opf_low: This 6-bit field encodes the specific operation for a Move
Floating-Point Register if Condition is satisfied (FMOVcc) or Move
Floating-Point register if contents of integer register match condition
(FMOVr) instruction.
p: This 1-bit field encodes static prediction for BPcc and FBPfcc
instructions, as follows:
p Branch prediction
0 Predict branch will not be taken
1 Predict branch will be taken
rcond: This 3-bit field selects the register-contents condition to test for a
move based on register contents (MOVr or FMOVr) instruction or a
branch on register contents with prediction (BPr) instruction.
rd: This 5-bit field is the address of the destination (or source) r or f
register(s) for a load, arithmetic, or store instruction.
rs1: This 5-bit field is the address of the first r or f register(s) source
operand.
240
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
rs2: This 5-bit field is the address of the second r or f register(s) source
operand with i = 0.
shcnt32: This 5-bit field provides the shift count for 32-bit shift
instructions.
shcnt64: This 6-bit field provides the shift count for 64-bit shift
instructions.
simm10: This 10-bit field is an immediate value that is sign-extended to
64 bits and used as the second ALU operand for a MOVr instruction
when i = 1.
simm11: This 11-bit field is an immediate value that is sign-extended to
64 bits and used as the second ALU operand for a MOVcc instruction
when i= 1.
simm13: This 13-bit field is an immediate value that is sign-extended to
64 bits and used as the second ALU operand for an integer arithmetic
instruction or for a load/store instruction when i = 1.
sw_trap#: This 7-bit field is an immediate value that is used as the
second ALU operand for a Trap on Condition Code instruction.
x: The x bit selects whether a 32- or 64-bit shift will be performed..
241
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
In the first format the 32-bit instruction is divided into seven fields. The
first field (reading from the left) holds the 2-bit value 11, while the fifth
field (bit 13) holds the 1-bit value 0. These bits are the same for all load
and store instructions that use two source registers. The sixth field (bits 5
through 12) holds the address space indicator, asi. For the present, we
will always set the asi field to zero. The remaining fields, rd, op4, rs1, and
rs4, hold encodings for the destination register, the operation, and the two
source registers, respectively. Registers are encoded using the 5-bit
binary representation of the register number. Table 4-19 summarizes the
operation codes for the load and store instructions.
Table 4-19: Operation encodings for the load and store operations
Example 4-14.
Show how to assemble the following load instruction:
s1 s0 29 25 24 19 18 14 13 12 5 4 0
11 01011 000011 00100 0 0000000 00111
That is, 1101 0110 0001 1001 0000 0000 0000 0111 in binary or
0xD6190007 in Hexadecimal.
242
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
243
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Note that angle brackets, <>, enclose alternative forms of the operand
braces, {}, enclose optional operands and Op2 is a flexible second
operand that can be either a register or a constant. Note also that most
instructions can use an optional condition code suffix
B label Branch -
244
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
LDRB,
Rt, [Rn, #offset] Load Register with byte -
LDRBT
LDRD Rt, Rt2, [Rn, #offset] Load Register with two bytes -
245
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
LDRH,
Rt, [Rn, #offset] Load Register with Halfword -
LDRHT
LDRSB,
Rt, [Rn, #offset] Load Register with Signed Byte -
LDRSBT
NOP - No Operation -
246
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
247
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
248
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
CMP R1, R2
LDREQ R3, [R4]
LDRNE R3, [R5]
250
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
4-21. Summary
251
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
Note that CS:IP are some special combinations of segment registers and
general registers that point to important addresses. For instance: CS:IP
points to the address where the processor will fetch the next byte of code.
SS:SP points to the location of the last item pushed onto the stack.
DS:SI is often used to point to data that is about to be copied to ES:DI
When the x86 processors are powered up or re-initialized (by reset) the
CPU is in real mode (with real address) and all protection features are
disabled and the memory space is limited to 1MB, of physical memory.
This is very similar to what happen with earlier IBM PC's, operating in
real mode. In such a case. we can assume an IBM PC equipped with 8088
microprocessor, to discuss the initialization (boot-up) process. The boot-
up process begins when the PC is powered up or reset. This will execute a
jump instruction at address F000:FFF0 inside the ROM BIOS chip that
points to the first instruction of the BIOS. The ROM BIOS program is
approximately 8K bytes long, or so, and controls all of the hardware on
the system board and interface cards. The CPU support chips are
initialized with the proper default values to control such things as the
video monitor, disk drives, printer ports and keyboard. After the
initialization of all the hardware, the program executes a very extensive
diagnostic type test on the x86 CPU, ROMS, RAM etc. to complete what
is called the Power-On-Self-Test (POST).
252
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
If there are no critical errors during POST, the default disk drive (e.g., C:)
is turned ON and tested. The pass condition will cause the head to
position over track 0, head 0, sector 0 of the disk and the boot loader
program is transferred into memory.
Once loaded in memory, the boot loader program is given control of the
CPU and a series of instructions are executed that will look in the
directory of the disk for the system files, dos.sys and bio.sys. If these two
system files are on the disk, they are loaded into low memory in that
order, along with any driver programs that are listed in the ―device =
statement‖ of config.sys file.
Control of the CPU is then given to the DOS program to finish the boot
up process, by loading the command processor program command.com
into memory in the next available space right after dos.sys. The boot
process is complete when command.com is given the final control of the
CPU. So, the operating system is made up of three basic programs
(ibmbio.sys, ibmdos.sys and command.com) that are loaded in low
memory starting at 00000H and ending at 0B000H. The actual ending
address will depend on the version of DOS and the number of device
drivers that are loaded during the boot process.
253
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
4-22. PROBLEMS
[4-2] Find the contents of all the affected 8086 microprocessor registers
and flags, after each line of the following program has run. For each line,
indicate the new value for the registers that change.
ADD BL,AL
SHL AL,CL
AND AL,0Fh
SUB BL,AL
[4-4] Explain with examples all the stack operations supported by 80x86
family of CPUs (e.g., PUSH reg, PUSHA, PUSHF, POP reg, POPA,
POPF). Draw schematic representations of the stack area before and after
execution of these instructions
MOV CL,08H
MOV BL,B2H
MOV AL,7FH
254
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
[4-8] Write a 8086 assembly program that fills 1000D byte block of
memory in the extra segment beginning at address BLOCK, with the data
byte 20H (ASCII Space).
[4-9] Examine and encode the following portion of list file for real mode
program. Consider that the register BX contains initially 1234H
CMP BX,4 ; Be sure BX is in range
JNC ERROR
SHL BX,1 ; Convert to word offset
MOV BX,TABLE[BX] ; Index into table
TABLE: DW PROC0
DW PROC1.
DW PROC2.
DW PROC3
ERROR:
[4-10] Write an assembly program that input two 8-bit unsigned numbers
from input ports A0H, B0H and output the product to the 16-bit output
port 7080H.
[4-11] Obtain the approximate decimal value that conforms to the IEEE
754 floating point format of the following numbers:
A = 100101111 10000000000000000000000
B = 010001110 00000000000000000000001
[4-13] The BOOT sector files of the system are stored in _____ .
a) Harddisk b) ROM
c) RAM d) Fast solid state chips in the motherboard
[4-16] In Intel Pentium processors, the size of the floating registers can
be extended upto _____ .
a) 128 bit b) 256 bit
c) 80 bit d) 64 bit
[4-17] Find the contents of all the affected 8086 microprocessor registers
and flags, after each line of the following program has run. For each line,
indicate the new value for the registers that change. If the instruction is
not a legal instruction, write "ILLEGAL" anywhere inside the box. If no
registers change, write ―NONE‖ anywhere inside the box. Assume the
following Status before each part:
Registers Memory
AX 0002 BX 0114 CX 0003 DX FF05 SI 0003 ARRAY DW 5,4,3,2,1
[4-18] Show how to load the Flag register from the Accumulator (AH) ←
(Flags), on an 8088 microprocessor system. Show also how to perform the
inverse process, to store AH into Flags (Flags) ← (AH)
[4-19] Show how to clear and set the interrupt flag on an 8088
microprocessor system?
256
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
257
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
4-23. Bibliography
[4] [15] Peter Norton et al, PC Programming Bible, Microsoft Press, 1996.
258
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4
259
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Assembly Language:
Programming, Compilation &
Debugging
Contents
5-1. Introduction
5-2. DEBUG Program
5-3. Macro Assembler Programs
5-4. Assembly Language Instructions Format.
5-5. Assembler Data Types.
5-6. Assembler Directives
5-7. Declaring Variables
5-8. Modifiers & Attribute Operators
5-9. Difference between Values, Addresses and Pointers
5-10. Arrays in Assembly Language
5-11. Tables & Lookup Tables in Assembly Language
5-12. Other Data Structures in Assembly Language (Queues, Linked lists,..)
5-13. Working with Strings in Assembly Language
5-14. Procedures in Assembly Programs
5-15. Functions in Assembly Programs
5-16. Writing & Initializing Interrupts in Assembly Programs
5-17. Creating Macros in Assembly Programs
5-18. Assembly Program Compilation & Linking
5-19. 16-Bit Macro-Assemblers
5-20. MASM Assembler Syntax for x86 memory Addressing Modes
5-21. 32-Bit Macro-Assemblers (MASM32)
5-22. 64-Bit Macro-Assemblers (YASM)
5-23. Summary of x86 Macro-assembler Programs
5-24. Summary
5-25. Problems
259
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
260
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Assembly Language:
Programming,
Compilation & Debugging
5-1. Introduction
We outlined, so far in chapter one, that a program is a sequence of simple
commands that lead the computer to solve some problem. Once the
program is written and debugged, the computer can execute the
instructions. We have also indicated that the assembly language of a
given processor is a collection of instructions, which has to be translated
into bit patterns, or machine code, in order to be executed by the
microprocessor. Assembly language has several benefits:
261
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Table 5-1. DEBUG Instructions. The square brackets [ ] contain optional. parameters
Starting debug this way will allow you to work on the internal hardware
of the computer and view the contents of all of the memory location in
RAM. You can also load in as many as 128 sectors of a floppy or Hard
disk and view, edit or move the contents to another. You can also use
DEBUG to perform so many other tasks such as:
262
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
C:\> DEBUG
-A 0100
14BA:0100 MOV CX,0A
14BA:0103 MOV AX,0
14BA:0106 ADD AX,CX
14BA:0108 LOOP 0106
-
Fig. 5-1. Calling the DEBUG program. The assemble command “-A” is used to
create an assembly program at a given address (here 0100).
One can also trace the program execution step-by-step, using the “T”
command. The trace command displays the content of 8086 registers after
execution of each line of program as shown in figure 5-2.
-T
AX=0000 BX=0000 CX=000A DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0103 NV UP EI PL NZ NA PO NC
14BA:0103 B80000 MOV AX
-T
AX=0000 BX=0000 CX=000A DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0106 NV UP EI PL NZ NA PO NC
14BA:0106 01C8 ADD AX
-T
AX=000A BX=0000 CX=000A DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0108 NV UP EI PL NZ NA PE NC
14BA:0108 E2FC LOOP 0106
-T
AX=000A BX=0000 CX=0009 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0106 NV UP EI PL NZ NA PE NC
14BA:0106 01C8 ADD AX
-
263
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
One can also list any part of the program using the dump “-D” command.
For instance, one can use the dump command to list the BIOS date, which
is stored in the memory address F000:FFF5 through F000:FFFD of the
ROM BIOS of the IBM PC:
C:\> DEBUG
-D F000:FFF5 FFFD
F000:FFF0 31 31 2F-31 32 2F 30 37 00 11/12/07.
-
Fig. 5-3(a). Using the DEBUG program to display memory contents, with “-D”
command.
The following figure 5-3 depicts how to enter a value using the “-E”
command, store it in memory address [210] and calculate its square root
(using coprocessor instructions). A 16-bit integer constant, which is
stored in memory address [210] is read using the FILD (Floating Integer
Load) coprocessor instruction. This number is stored internally in the
80x87 as an 80-bit floating point value. After taking the square root,
using FSQRT, the floating-point result will be stored, using FSTP, at
memory address [200] for inspection.
C:\> DEBUG
-A 100
FILD word [210]
FSQRT
FSTP qword [200]
INT 20
-E 210 ; Enter the value 0005 as a 16-bit integer
3AAO:0210 00.05 00.00 ; Enter the value 05 at address [210]
; and the value 00 at address [211]
-G ; Go! Run the program
-D 200
3AA0:0200 A8 F4 97 9B 77 E3 01 40
3AA0:0210 05 00
Fig. 5-3(b). Using the DEBUG program to enter data, with “-E” command, and run
programs using the “-G” command..
Note that the 16-bit value of the integer 5 reads 0005, and the 64-bit value
of square root of 5 reads 4001E3779B97F4A8
The “-L” command is also used to load a disk sector. For instance, the
following commands load and examine a sample boot sector:
264
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
C:\>DEBUG
-L0000 2 0 1
-D0000 001F
1026:0000 EB 3C 90 4D 53 44 4F 53-35 2E 30 00 02 04 01 00 .<MSDOS5.0
1026:0010 02 00 02 00 00 F8 F8 00-11 00 10 00 11 00 00 00 ... ... ..
-
Fig. 5-3(c). Using the DEBUG program to load a disk sector, with “-L” command.
The DEBUG program can also be called at the DOS prompt, with a
binary file name, that you‟d eventually like to load, un-assemble (decode
it from binary to assembly) and edit, like this.
Then, DEBUG will be loaded into memory along with the file that is
specified in the command line and put the first byte of the file at offset
100 of the work area. By starting debug this way, you can view, edit or
move a COM program (smaller than 64 kB).
4. All the data, code, and the stack area are in the same segment.
Machine Instructions
Assembler Directives
Assembler Controls
266
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Machine instructions are the machine code that can be executed by the
microprocessor. Appendix A provides an overview about 80x86 machine
instructions. Detailed discussion of the 80x86 instructions can be found in
chapter 4 and Appendix B.
Assembler controls set the assembly modes and direct the assembly
flow. Table 5-3 contains a guide to all the assembler controls .
where:
267
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Format 2
Constant Directive Operand
Example
NUM2 EQU 18H ; let NUM2=18H
Format 3
Variable Directive Operand
Examples
VAR9 DB 00 ; let VAR9 be a byte
; variable and fill it with 00
MSG DB “Hello” ; let MSG be a byte string
; variable = Hello
String2 DB “Hey”,0 ; String2 is a zero-terminated
; byte string variable = Hey
X BYTE 1 ; X is a byte whose value = 1
Y SBYTE -2 ; Y is a signed byte whose
; initial value = -2
The directive DB (or BYTE or SBYTE) is short for declare byte and the
MSG is an array of bytes (an ASCII character takes up one byte). Data
can be declared in a number of sizes, like bytes (DB), words (DW),
double words (DD) and quad words (DQ). Note that "DB" is an older
term that MASM 6.x and later assemblers updated with “BYTE” or
“SBYTE”. More details about data types and assembler directive will
come in the following sections.
As for operands, there are three basic types of operands that can be used
in assembly instructions, immediate, memory or another register.
268
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
In the above example we see that the assembly program is divided into
three parts, each is called a program segment. Segments begin with the
segment name followed by the reserved word SEGMENT and end with
segment name followed by ENDS. Note that some lines of assembly
modules may contain only assembler directives, instead of
microprocessor instructions. It should be also noted your style of writing
assembly language programs is almost as important as your accuracy.
Good habits in layout, selection of symbolic names, and appropriate
comments help you to program correctly and easily.
270
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Table 5-3. Summary of the x86 macro assembler directives and pseudo-ops.
Directive Description
.286, .386, .486, .586, … Processor directives
.8087, .80387, .NO87 Coprocessor directives
.CODE Start Code segment
.DATA Start Data segment
.EXIT Exit to DOS
.MODEL Select memory mode (small, medium, large, etc.)
.STARTUP Indicate start of program, when using many modules
ABS Absolute value of operand
ALIGN Align to word boundary
ASSUME sr: sy(,...) Assume segment register name(s )
ASSUME NOTHING Remove all former assumptions
BYTE Byte type operation (=DB)
DB e(,...) Define Byte(s (
DD e(,...) Define Double word(s)
DQ Define Quad byte(s)
DT Define Tera byte(s)
DUP Generate duplicate variable or constant
DWORD Double Word operation (=DD)
DW e(,...) Define Word(s(
END End of program
ENDM End of macro
ENDP End of procedure
ENDS End of segment
EQU Assign this as Equal
EXT(sr:) sy(t) External(s)(t=ABS/BYTE/DWORD/FAR/NEAR/WORD)
FAR IP and CS registers altered
HIGH High-order 8 bits of 16-bit value
IF, ELSE, ENDIF Conditional pseudo ops
LABEL t Label (t=BYTE/DWORD/FAR/NEAR/WORD)
LENGTH Number of basic units
LOW Low-order 8 bit of 16-bit value
NEAR Only IP register need be altered
OFFSET Offset portion of an address
ORG Define program starting address (origin)
PAGE n1, n2 Number of lines per page, maximum number of chars/page
PROC t Procedure (t=FAR/NEAR, default NEAR)
PTR Create a variable or label
SEG Segment portion of an address
SHORT One byte for a JMP operation
SIZE Number of bytes defined by statement
TITLE Title line (Header of each page)
TYPE Number of bytes in the unit defined
WORD Word operation (=DW)
271
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
In addition, the DF/ FWORD directive declares 48-bit pointers for use in
32-bit protected mode on the 80386 and later processors. You should only
use this directive for 48-bit far pointers on the 80386. DQ/QWORD lets
you declare quadword (8-byte) variables. The original purpose of this
directive was to let you create 64-bit double precision floating point
variables and 64-bit integer variables. There are better directives for
creating floating point variables. The DT / DTBYTE directives allocate
10-bytes of storage.
There are two data types indigenous to the 80x87 coprocessor that uses
a10-byte data: ten byte BCD values and extended precision (80-bit)
floating point values. As for the floating point type, you can use REAL4,
REAL8 and REAL10 to reserve 4, 8, and 10 bytes. The operand fields for
these statements may contain a question mark (if you don't want to
initialize the variable) or it may contain an initial value in floating point
form. The following examples demonstrate their use:
272
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
In addition, you can also define your own types using TYPEDEF
directive, in MASM 6 and later assemblers.
If you're writing a big assembly program, you'd rather divide it into several
modules (files). The EXTRN directive is used to tell the assembler that the
symbols following it are already defined (declared) in another assembly
module. Also, the directive PUBLIC may be used to tell the assembler that
the symbols following it are shared for all modules.
PTR operator. One of the purposes of the PTR operator is to specify the
length of a quantity in ambiguous situations. It is written after the desired
type to specify the length of unknown length operand. For instance, the
following instruction:
INC [BX]
ADD AX,[BX]
273
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Here, the assembler will assume that the content whose address in BX is a
word (cause AX is 16 bit) and process a word addition.
OFFSET Operator. The OFFSET operator returns the value of the offset
address (EA) of a variable or a label. For instance, the instruction:
will put the address of the (data) segment containing the variable Op2
inside EAX.
TYPE Operator. The TYPE is used primarily with variables and structures
to return the number of bytes associated with them. So, if Array1 is a one-
byte array, then
Example 5-2:
The following code shows how to display a message using INT 21.
274
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
MSG DB "Hello$"
MOV DX,OFFSET MSG ; DX contains offset of message MSG
MOV AX,SEG MSG ; AX contains segment of MSG
MOV DS,AX ; DS:DX points to MSG
MOV AH,9 ; DOS function 9 - Display string MSG
INT 21H ; Call DOS service routine.
Using square brackets around EAX gives access to the information at the
address in EAX. A register enclosed in square brackets is effectively a
memory operand. The size of the data accessed at the address is determined
by the size of the register used to receive it. In the above example it is a 32-
bit value as it uses a 32-bit register but it can be done with 16- and 8-bit
values as well using the correct size register.
The so-called Pointers are special type of variables, which contain
addresses of other variables. Pointers are usually used in high level
languages (like C and PASCAL) for passing addresses between subroutines
and performing other types of complex data manipulation.
we then put the ADDRESS of a variable into the EAX register (LEA means
load effective address). When you put that ADDRESS into a variable of its
own, you‟ll have a Pointer to the address:
MyArray DB 29,14,23,10
This line allocates 4 consecutive bytes in RAM. The address of the first
byte element is MyArray, the address of the second byte is MyArray+1,
and so on. Similarly, in order to declare a 100 byte element vector, whose
initial values are 0 we can make use of the DUP directive as follows:
Solution: You copy the number 16th member of a zero based index into
the register that you are using as the index, the address of the array into
the register that you are using as the base address and then, read the
value of the array member into another register.
These three lines of code have read the required variable from the array
into the EAX register. If you wanted to compare the 16th and 17th
members of the array and not have to use an additional register, you can
add the required displacement so that you only have an extra line of code .
276
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
My2DArray DB 20 DUP(0)
In order to point to the (Ith row, Jth column) element of an array, put its
address in EAX, such that:
MOV ESI My2DArray
MOV ECX M-1
MOV EAX, [ESI+ECX*I+ J]
Address Content
MEMORY
Vector1+3 00000101
Vector1+2 -----------
Vector1+1 -----------
Vector1 00000000
Low memory
Fig. 5-6. Arrangement of data arrays in the main memory.
Note that the starting address of Vector1 will be decided by the assembler
and will be the first available memory place inside the data segment, where
there is a room for 4 consecutive bytes. The second array (Mat1) will
immediately follow, in the next available 6 consecutive bytes.
277
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Using a table look up, however, allows you to reduce this sequence of
instructions to just four instructions:
5-12.1. Queues
A queue is a list of records in which records are inserted at one end of the
list (tail of the list), and records are extracted and deleted from the other
end (head of the list). Thus, a queue has a First-In-First-Out (FIFO)
structure: records are removed from the list in the same order as they
arrive. An insertion of a record is said to en-queue it; similarly, deletion
de-queues a record. Note that queue is different from a sack, which has a
Last-In-First-Out (LIFO) structure, such that data is added (pushed) or
deleted (popped) from one end (top of the stack).
279
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
In a linked list, each record contains a link field which holds the address
of the next record in the list. The sequencing from one record of the list to
the next thus involves accessing the link field of each record. Therefore,
insertions and deletions of records involve only resetting of links. As
records may be located anywhere in memory, linked lists are appropriate
whenever dynamic allocation is needed. However, linked lists are not
needed for storage of static data like tables of constants. In order to
understand the idea of a linked list, consider the following list of names:
Ahmad at offset a, Badr at b, Camel at c, and Darsh at d. Each cell now
has 2 fields: info and link:
The link field in the last record, Darsh, has a special value `00' to mark
the end of the list. We draw this list with arrows as follows:
To delete the record Badr, change the link field in the record Ahmad:
To insert a new record Danny at address g between Camel and Darsh, set
the link of Camel to the address of Darsh and the link of Danny to the
address of Darsh:
280
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
In memory, each link field holds one-word offset in the data segment.
Thus if the information fields occupy b bytes, then the length of each
record is b+2 bytes. Assume that the link field is located b bytes from the
beginning of the record; let us define the constant LINK:
LINK EQU b
Suppose BX holds the offset of (the first byte of) a record in a linked list.
Then [BX+LINK] specifies the link field of this record. To change BX to
point to the next record in the list:
Before:
After:
The use of linked lists arose in the early 1960‟s in the course of artificial
intelligence research. The linked list is a fundamental data structure of the
LISP language, which is heavily used for artificial intelligence
programming. Many variations on the idea of linked cells have been
subsequently introduced. For example, a doubly linked list has both
forward and backward links to facilitate searching in the list. The binary
trees also use more than one link per record.
281
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Let the array that holds the data be T(0), ..., T(n-1). The array T is called
a hash table. A hash function h transforms the information x into an
integer h(x) such that: . The information x is then stored
at T(h(x)), together with any additional information fields associated with
x. If the record T(h(x)) is already in use, then a collision occurs, and x
must be stored elsewhere. A good hashing scheme minimizes the
frequency of collisions by scattering information into random locations in
the hash table. The choice of hash functions and the resolution of
collisions are discussed below.
In a binary tree, each record is stored in a node. For each node X, at most
one node Y is the left child of X and at most one node Z is the right child
of X. In other words, any node X may have 0, 1, or 2 children. X is the
common parent of nodes Y and Z. There is one node in the tree with no
parent; it is called the root of the tree.
282
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
In the above figure depicts a binary tree, where each record has a link
pointing to its left child and a link pointing to its right child. For instance,
Badr is the left child of Darsh, and Lola is the right child of Darsh. Darsh
is the parent of both Badr and Lola. Ahmad is the root of the tree.
However, some applications of the binary tree may include a pointer from
each node to its parent.
Notice the arrangement of names in this tree. All names in the left subtree
of any node are lexicographically less than the name at that node; that is,
they would occur earlier in an alphabetic sort. All names in the right
subtree are lexicographically greater. For instance, the left subtree of
Frank is the tree rooted at David, and all names in this subtree are
lexicographically less than Frank. Thus the location of a record in the tree
expresses its relationship to other records.
283
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Example 5-4.
Write down a string-copy subroutine, equivalent to the C-language
function strncpy(src, dest, len), where src and dest are the addresses of
the source and destination strings and len is the number of characters to
be copied.
Solution. The following example shows how this is done. Here src is the
address of the of the source buffer to copy, dest is the address of the
destination buffer, len is the byte count to copy
In this example, MOVSB copies each byte from ESI to EDI and
decrements ECX. The exit condition for the REP prefix is when ECX is
decremented to zero . It is assumed that the destination buffer (dest) is
large enough to receive the byte count in the sourc (src). When you copy
a zero-terminated string, you can write an algorithm that copy until it
finds an ASCII zero.
Example 5-5.
Write down a string-copy subroutine, equivalent to the C-language
function strcpy(src, dest), where src and dest are the addresses of the
source and destination strings. The source string src is assumed to be
zero-terminated.
284
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Solution.
The following algorithm shows how this is done.
CLD ; Clear direction flag (DF=0) to read forward
MOV ESI, src ; Put source string address into the source index
MOV EDI, dest ; Put destination string address into the destination index
BACK :
LODSB ; Load byte from source into AL and inc ESI
STOSB ; Write AL to dest and inc EDI
CMP AL, 0 ; See if the byte is an ASCII zero
JNE BACK ; Read the next byte if its not
A trick that will make this algorithm run faster is to directly move each
byte from the source address (src) to AL and then from AL to the
destination address (dest). On Pentium and later processors, it is faster to
use MOV/INC than LODSB or STOSB. This is done by "dereferencing"
both ESI and EDI so that they function as memory addresses as shown in
the following example:
Example 5-6:
Show how to implement the above strcpy(src,dest) subroutine using
MOV and INC instead of LODS and STOS.
Solution:
The following algorithm shows how this is done.
MOV ESI, src ; Put source address into the source index
MOV EDI, dest ; Put destination address into the destination index
BACK :
MOV AL, [ESI] ; Copy byte at address in ESI to AL
INC ESI ; Increment address in ESI
MOV [EDI], AL ; Copy byte in AL to address in EDI
INC EDI ; Increment address in EDI
CMP AL, 0 ; See if the byte is an ASCII zero
JNE BACK ; Jump back and read next byte if not
It should be noted that the direction flag (DF) does not affect this method
and you can use any 32-bit registers when you are not using the string
instructions. This code is longer but faster on recent processors with
pipelines due to what is called pairing.
When mnemonics can go through the two pipelines in pairs, the code runs
nominally twice as fast. The choice of mnemonics in this simple
algorithm is small instructions such that it runs faster than the shorter
algorithm with older string instructions.
285
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Then you can load the SI register with the address of some block of 256
bytes and issuing a CALL Init instruction, you can zero out the
specified block. However, in a Macro-assembler environment, you don‟t
define your own procedures in this manner. Instead, you should use the
MASM PROC and ENDP assembler directives as follows:
Init PROC
XOR AX, AX
MOV CX, 128
ZLOOP: MOV [SI], AX
ADD SI, 2
LOOP ZLOOP
RET
Init ENDP
The x86 microprocessors support NEAR and FAR subroutines calls. The
NEAR calls and returns transfer control between procedures in the same
code segment. Far calls and returns pass control between different
segments. The PROC directive has an optional operand that is either near
or far. If the operand field is empty, then NEAR is assumed.
This ISR obviously does not preserve the machine state. Suppose you
were executing the following code segment when a hardware interrupt
transferred control to the above ISR:
MOV AX, 5
ADD AX, 2
INT nn ; Suppose the interrupt (that calls NaiveISR) occurs here.
:
PRINT
The interrupt service routine would set the AX register to zero and your
program would print zero rather than the value five. Worse yet, hardware
interrupts are generally asynchronous, meaning they can occur at any
time and rarely do they occur at the same spot in a program. Therefore,
the code sequence above would print seven most of the time; once in a
great while it might print zero or two (it will print two if the interrupt
occurs between the MOV AX,5 and ADD AX,2 instructions). Bugs in
hardware interrupt service routines are very difficult to find, because such
bugs often affect the execution of unrelated code.
The solution to this problem, of course, is to make sure you preserve all
registers you use in the interrupt service routine for hardware interrupts
and exceptions. Finally, it should be noted out that writing an ISR is only
the first step for implementing an interrupt handler. You must also
initialize the interrupt vector table entry with the address of your ISR.
There are two ways to accomplish this - directly store the address in the
interrupt vector table or use a DOS call and let it do this task for you..
287
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Storing the address directly is an easy job. All you need to do is to load
the segment register CS with zero (since the interrupt vector table is
situated in segment zero) and store the four byte address at the
appropriate offset within that segment. The following code sequence
initializes the entry for interrupt 255 with the address of the interrupt
routine NaiveISR presented above:
MOV AX, 0
MOV ES, AX
PUSHF
CLI
MOV WORD PTR ES:[0FFH*4], OFFSET NaiveISR
MOV WORD PTR ES:[0FFH*4 + 2], SEG NaiveISR
POPF
This code turns off the interrupts while changing the interrupt vector
table. This is important if you are patching a hardware interrupt vector
because it wouldn't do for the interrupt to occur between the last two MOV
instructions above; at that point the interrupt vector is in an inconsistent
state and invoking the interrupt at that point would transfer control to the
offset of NaiveISR and the segment of the previous interrupt 0FFH
handler. This, of course, would be a disaster. Perhaps a better way to
initialize an interrupt vector is to use DOS' Set Interrupt Vector call.
Calling DOS with ah equal to 25H provides this function. This call
expects an interrupt number in the al register and the address of the
interrupt service routine in DS:DX. The call to MS-DOS that would
accomplish the same thing as the code above is:
Although this code is a little bit longer than writing the data directly into
the interrupt vector table, it is safer. Many programs monitor changes
made to the interrupt vector table through DOS. If you call DOS to
change an interrupt vector table entry, those programs will become aware
of your changes.
288
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Parameters are values that you pass to and from a procedure. Pass by
name is the parameter passing mechanism used by macros.
For instance, consider the following MASM macro:
If you invoked the Add12 macro in the form: Add12 BX, CX, then
MASM emits the following code, substituting BX for Parameter1 and CX
for Parameter2:
MOV AX, BX
ADD AX, CX
You can place the COPYSTR macro at the beginning of your assembly
program and then invoke it as follows:
STRING1 DB “BLOCK1”
STRING2 DB “BLOCK2”
:
COPYSTR STRING1,STRING2,10
:
289
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Note that the COPYSTR macro is invoked in the main program, with its
name followed by the actual parameters (STRING1, STRING2 and 10).
Example 5-8:
The following example demonstrates the creation and use of an assembler
macro (MOVE), which moves data from a location (B) to another (A),
and how it can be invoked from within an assembly program:
Example 5-9:
The following example demonstrates the creation and use of an assembler
macro (PRINT), to print out the IBM PC system time on the screen in
ASCII characters:
PRINT MACRO PARM
PUSHA
MOV AL,PARM
AAM
ADD AL,30H
MOV DL,AL
MOV AH,02
INT 21
POPA
ENDM
:
:
MOV AH,44
INT 21
PRINT CL
PRINT CH
PRINT DL
PRINT DH
END
290
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Libraries
291
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
The TINY memory model mimics the 8080, all segments fit into one
segment of 64kB. In the SMALL memory model there is a separate code
segment and all other segments (data and stack) fit into one segment.
Other memory models permit the edition of larger programs with multiple
code segments. The LARGE model allows multiple code segments and
multiple data segments. The HUGE model is same as large model, but
allows data segments of 128 kB for large double precision tables. An
illustration of these memory models is shown in figure 5-10.
MASM16 synopsis:
ml [-o outfile] infile.asm
Turbo Assembler synopsis:
tasm [-o outfile] infile.asm
tlink outfile [/t]
where infile is the assembly source filename and outfile is the object
filename. The /t switch makes a COM file. This will only work if the
memory model is declared as tiny in the source file. If you have a
compiler other than MASM16 or TASM Turbo Assembler, then refer to
its instruction manual.
Note that MASM treats the "[ ]" symbols just like the "+" operator. This
operator is commutative, just like the "+" operator. Of course, this
discussion applies to all the 80x86 addressing modes, not just those
involving BX and SI. You may substitute any legal registers in all the
above addressing modes. The effective address (EA) is the final offset
produced by an addressing mode computation. For example, if BX
contains 10H, the effective address for 10H[BX] is 20H.
293
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
You can memorize all these forms so that you know which are valid
(and, by omission, which forms are invalid). However, there is an easier
way besides memorizing these 17 forms. Consider the following chart:
[BX] [SI]
DISP ------ -----
[BP] [DI]
If you choose zero or one items from each of the columns and wind up
with at least one item, you've got a valid 80x86 memory addressing
mode. Some examples. For instance, choose disp from column one,
nothing from column two, [DI] from column 3, you get disp[DI].
Example 5-10.
The following program demonstrates how to write to the screen using the
file function 40H of interrupt 21H. The program makes use of the small
memory model, in which all segments (except the stack) fit into one
segment.
TITLE Example10.asm
.MODEL SMALL
.STACK
.CODE
MOV AX,@DATA ; SETUP DS AS DATA SEGMENT
MOV DS,AX
MOV AH,40H ; FUNCTION 40H - WRITE FILE
MOV BX,1 ; HANDLE = 1 (SCREEN)
MOV CX,17 ; LENGTH OF STRING
MOV DX,OFFSET TEXT ; DS:DX POINTS TO STRING
INT 21H ; CALL DOS SERVICE ROUTINE
MOV AX,4C00H ; TERMINATE PROGRAM
INT 21H
.DATA
TEXT DB "THIS IS A TEXT"
END
294
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Example 5-11.
The next program shows how to set up and call function 13H of interrupt
10H - write string. This has the advantages of being able to write a string
anywhere on the screen in a specified color but it is hard to set up. The
program also makes use of the small memory model.
TITLE: Example11.ASM
.MODEL SMALL
.STACK
.CODE
MOV AX,@DATA ; SETUP DS AS THE SEGMENT FOR DATA
MOV ES,AX ; PUT THIS IN ES
MOV BP,OFFSET TEXT ; ES:BP POINTS TO MESSAGE
MOV AH,13H ; FUNCTION 13 - WRITE STRING
MOV AL,01H ; ATTRIBUTE IN BL, MOVE CURSOR
XOR BH,BH ; VIDEO PAGE 0
MOV BL,5 ; ATTRIBUTE - MAGENTA
MOV CX,17 ; LENGTH OF STRING
MOV DH,5 ; ROW TO PUT STRING
MOV DL,5 ; COLUMN TO PUT STRING
INT 10H ; CALL BIOS SERVICE ROUTINE
MOV AX,4C00H ; RETURN TO DOS
INT 21H
.DATA
TEXT DB "THIS IS A TEXT"
END
Example 5-12.
The next program demonstrates how to write to the screen using REP
STOSW to put the writing in video memory.
TITLE Example12.ASM
.MODEL SMALL
.STACK
.CODE
MOV AX,0B800H ; SEGMENT OF VIDEO BUFFER
MOV ES,AX ; PUT THIS INTO ES
XOR DI,DI ; CLEAR DI, ES:DI POINTS TO VIDEO MEMORY
MOV AH,4 ; ATTRIBUTE - RED
MOV AL,"G" ; CHARACTER TO PUT THERE
MOV CX,4000 ; AMOUNT OF TIMES TO PUT IT THERE
CLD ; DIRECTION - FORWARDS
REP STOSW ; OUTPUT CHARACTER AT ES:[DI]
MOV AX,4C00H ; RETURN TO DOS
INT 21H
END
295
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
Example 5-13.
The next program makes use of the TINY memory model, in which all
segments fit into one segment of 64kB. The assembly program
demonstrates some simple input, output operations
TITLE Example13.ASM
.MODEL TINY
.CODE
ORG 100H
START:
MOV DX,OFFSET MESSAGE ; DISPLAY MESSAGE ON SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
MOV DX,OFFSET PROMPT ; DISPLAY MESSAGE ON SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
JMP FIRST_TIME
PROMPT_AGAIN:
MOV DX,OFFSET ANOTHER ; DISPLAY MESSAGE On SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
FIRST_TIME:
MOV DX,OFFSET AGAIN ; DISPLAY MESSAGE ON SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
XOR AH,AH ; FUNCTION 00H OF
INT 16H ; INTERRUPT 16H GETS A CHAR
MOV BL,AL ; SAVE TO BL
MOV DL,AL ; MOVE AL TO DL
MOV AH,02H ; FUNCTION 02H - DISPLAY CHAR
INT 21H ; CALL DOS SERVICE
CMP BL,'Y' ; IS AL=Y?
JE PROMPT_AGAIN ; IF YES THEN DISPLAY IT AGAIN
CMP BL,'Y' ; IS AL=Y?
JE PROMPT_AGAIN ; IF YES THEN DISPLAY IT AGAIN
THEEND:
MOV DX,OFFSET GOODBYE ; PRINT GOODBYE MESSAGE
MOV AH,9 ; USING FUNCTION 09
INT 21H ; OF INTERRUPT 21H
MOV AH,4CH ; TERMINATE PROGRAM
INT 21H
.DATA
CR EQU 13 ; ENTER CHARACTER
LF EQU 10 ; LINE-FEED CHARACTER
MESSAGE DB "A SIMPLE ASSEMBLY PROGRAM$"
PROMPT DB CR,LF,"HERE IS YOUR FIRST PROMPT.$"
296
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
It should be noted that, if you‟d like to generate a *.COM file (which fits
inside a single 64kB segment), or *.EXE file that can be easily converted
to *.COM file (using the EXE2BIN program), proceed as follows:
1) Give the first instruction a label like START, and make sure that
the final instruction is end START
3) Take all variables, tables and move them into the CODE segment.
In fact, you cannot have a separate DATA segment.
6) When you link your program, the linker program may issue a
warning that there is no STACK segment. Ignore this warning.
297
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
So for each library, you use the include file that matches it. To find a
function that you need, look in the system include file to see which file
has the function prototype and include the matching library. Most of the
common functions are in the following three system DLLs:
299
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
where infile is the assembly source filename and outfile (if specified) is
the object filename. If outfile is not specified, yasm will derive a default
output file name from the name of its input file, usually by appending .o
or .obj, or by removing all extensions for a raw binary file. If errors or
warnings are discovered during execution, Yasm outputs the error
message to stderr (the terminal). Many options may be given in one of
two forms: either a dash followed by a single letter, or two dashes
followed by a long option name.
The last option selects the parser (the assembler syntax). The default
parser is „nasm‟, which emulates the syntax of NASM, the Netwide
Assembler.
300
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
As usual, most of these fields are optional; the presence or absence of any
combination of a label, an instruction and a comment is allowed. The
BITS directive specifies whether YASM should generate code designed
to run on a processor operating in 16-bit, 32-bit, or 64-bit modes. The
syntax is BITS 16, BITS 32, or BITS 64. Alternatively, USE16, USE32,
and USE64 directives can be used in place of BITS 16, BITS 32, and
BITS 64 respectively for compatibility with other assemblers. Another
available parser is GAS, which emulates the syntax of GNU AS (GAS).
5-24. Summary
The DEBUG program, which is supplied with the disk operating system
(DOS) of the IBM PC, can be used to write and execute short assembly
programs. When the DEBUG program is started, it responds with its own
hyphen “-” prompt,
When the hyphen prompt appears debug is waiting for you to enter one of
its commands. One can then enter one of the DEBUG single-letter
commands, followed by the appropriate parameters.
The DEBUG program, though simple, but it cannot be used to edit long
assembly programs. Alternatively, the assembler programs, like MASM,
simplify the editing job, and make it easy to edit and save assembly
programs.
302
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
5-25. PROBLEMS
5-1) Examine the following assembler code and explain the
meaning of each pseudo op.
WY DW 1000
WZ DW 1234H,0ABCDH
TEMP DW ?
SCORES DB 10 DUP(0)
TIMES DW 7 DUP(?)
TOP EQU 13
5-2) Use the DEBUG program, to find out the memory address of DOS
Timer Function. Show how to list the first 10 lines (80 bytes) of this
function, using your DEBUG program.
5-3) Show how to use the DEBUG program, to find out the date of the
BIOS, of your PC, given that the address of BIOS date lies in F000:FFF5
through F000:FFF5.
5-4) Show how to use the “-L” command of the DEBUG program to load
the boot sector of a hard disk
5-5) Write a template file, which may be used to generate any assembly
program, using MASM macro assembler.
5-6) Explain all the interrupts, which are supported by the 8086
microprocessors, giving a brief description of each .
5-7) Explain the term "Vectored Interrupts", give an example of its use
and describe how the 8086 microprocessors obtains the address of an
interrupt vector in relation to its Type number.
303
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
5-10) Write a program that displays the time of the day in the following
format: 8:15 P.M., Friday, February 11, 2005. Make use of INT 21, to
display a character on the screen.
5-13) Find out what does the following program do? Rewrite the program
with comments
PRINT MACRO PARM8
PUSHA
PUSH AX
MOV AL, PARM8
AAM
ADD AL, 3030H
PUSH AX
MOV AH, 02
INT 21
POP AX
304
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
MOV DL,AH
MOV AH,02
INT 21
POPA
ENDM
HR:
DB “Hours”
DB “Min”
MOV AH,44
INT 21
PUSH CX
PUSH DX
PRINT CH
MOV CL,5
LEA SI, Hours
NEXT1:
LODSB
MOV DL,AL
MOV AH,02
INT 02
INC SI
LOOP NEXT1
POP CX
PRINT CL
MOV CL,3
LEA SI, Min
NEXT2:
LODSB
MOV DL,AL
MOV AH,02
INT 21
INC SI
LOOP NEXT2
POP DX
PRINT DH
END
305
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5
5-26. Bibliography
[4] [15] Peter Norton et al, PC Programming Bible, Microsoft Press, 1996.
306
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 6
Writing Assembly
Routines within C/C++
and Java Programs
Contents
6-1. Introduction
6-2. General Considerations (16-bit , 32-bit and 64-bit programs)
6-2.1. Using YASM assembler within Visual Studio and VC++
6-2.2. I/O Software Layers
6-2.3. I/O in DOS, and Windows
6-2.4. Direct Memory Access (ActiveX and all that Stuff)
6-3. C-Programming Language (Summary)
6-4. C++ and Object-Oriented Programming
6-4.1. Object-Oriented Programming (OOP)
6-4.2. Classes in C++
6-4.3. Specific Operators in C++
6-4.4. Input / Output in C++
6-4.5. FILE Input / Output in C++
6-4.6. Inheritance in C++
6-4.7. Polymorphism in C++
6-4.8. Abstract Classes in C++
6-4.9. Operator Overloading
6-4.10. Friend Functions in C++
6-4.11. Generic Types (Templates) in C++
6-4.12. Additional Notes about C++
6-4.13. Common Problems in C/C++
6-4.14. C++11
6-5. Programming under Windows
6-5.1. Windows Messaging System
6-5.2. Writing Windows DLL in C/C++ and Assembly Languages
307
308
Writing Assembly
Routines within
C, C++ and Java Programs
6-1. Introduction
It is well known that the assembly language is more powerful and faster
than high level languages. However, in order to write a huge software
system, it is more practical to use a high level language, like C or C++,
and only use the assembly language when you would like to build
efficient I/O routines. One of the old jokes we heard about assembly
language was something like this: "There are three reasons for using
assembly language: speed, speed, and more speed." Even those who
absolutely hate assembly language will admit that if speed is your primary
concern, assembly I/O routines from within a high-level language is the best
way to go.
The following figure depicts the C++ source code of cout function and
how the C/C++ syntax is much easier than its equivalent assembly routine
and binary code. The code first writes out the "H", the same operations
have to be repeated for each letter, "e", "l", "l", "o". If you look up "H"
ASCII code you will find that it is 48, so that substituting a 65 gives "e",
6C "l", and 6F "o" and so on
Fig. 6-1. Piece of a C++ program and its equivalent binary code
309
Fortunately, the Microsoft Visual C/C++ does not make use of the AX,
BX, CX, DX and ES registers of the x86 microprocessors. Therefore, we
will be able to use them freely in assembly routines. In order to be able to
use any other register of the microprocessor, we save (PUSH) its content
into the stack before any manipulation for our own benefit. After we are
done, we have to reload its original contents again (POP) from stack.
6-2.1. Using YASM Assembler with Visual Studio and VC++
At first, you need to locate the directory where the VC++ compiler
binaries are located and put a copy of yasm.exe in this directory. Yasm
executable binaries that are not named yasm.exe will need to be renamed
yasm.exe after being placed in the appropriate directory. On a win32
system the win32 version of Yasm has to be used. On an x64 system
either the 32 or the 64 bit versions can be used but the rules file is set up
to use the 32 bit version. The win32 Yasm should be placed in the 32-bit
VC++ binary directory, which is typically located at:
If needed the 64-bit Yasm binary should be placed in the 64-bit tools
binary directory, which is typically at:
310
311
Fig. 6-2(b). Operating system layered structure, in old and recent PC’s.
VxD = Virtual Device Driver, VMM= Virtual Memory Machine
In the early days of DOS, the only way to create such speed hungry
applications was to directly access hardware, bypassing the DOS. In fact
DOS didn't give dedicated support for such multimedia devices. On the
other hand, the Window API provided a suitable means to develop
multimedia applications in a seamless manner. For instance, the graphic
functions were grouped in the graphic device interface (GDI), which is a
subsystem of Windows. This made life easier, for programmers, but it
results in a dramatic decrease of execution speed of the applications. So,
multimedia and game developers were forced to write their own
applications, using special direct hardware access interfaces, which do not
have all the burden of the Windows API. The most famous direct
hardware access technologies are:
1- DirectX,
2- OpenGL, and
3- Glide
312
1- DirectDraw
2- Direct3D,
3- DirectSound,
4- DirectInput
5- DirectPlay
6- DirectMusic
313
314
As shown in the above listing, the first part of the main() function
contains declaration of variables, which we intend to use in our C-
program. Note that each C instruction should be terminated with a
semicolon ―;‖. Also, all functions, as well as groups of C-instructions,
should be packed as whole blocks between parentheses { }.
In C-language, the integer literal constants that begin with "0x" are
hexadecimal constants. You need to replace the "0x" prefix with a "$"
prefix when converting the value from C to assembly. For example, the C
literal constant "0x1234A" becomes the assembly literal constant
"$1234A". A character literal constant in C and assembly usually consists
of a single character surrounded by quotes, e,g., `a' and ‗z‘. The C
language defines three different floating-point sizes: float, double, and
long double. Some compilers (e.g., Borland) use a 10-byte extended
precision format for long double while others (e.g., Microsoft) use an
eight-byte double precision format
The C language does not support a string type. Instead, C uses an array of
characters with a zero terminating byte to represent a character string. On
the other hand, Assembly defines a character string type. Fortunately,
assembly string format is compatible with the zero-terminated string
format that C uses, so it is easy to convert assembly strings into C format.
Both languages use double quotes to represent a string literal constant,
like ―MHS‖.
315
Examples 6-1:
Note that the array is a collection of variables, which hold the same type
of data. In C, arrays start at position 0. Also, an array can be initialized
when declared. For instance: int Z[5] = {1, 20, 33, 4, 50}; This means
that Z[0]=1; Z[1]=20; Z[2]=33; Z[3]=4; and Z[4]=50.
6-3.3. Expressions
In C language, an expression is anything that evaluates to a value. All
expressions are statements, e.g., y = x +5;
6-3.4. Operators
There exist so many mathematical, logical and relational operators, which
can be used in C language. The following tables depicts different types of
operators in C-language.
316
317
Example 6-2:
if (I >= 5) printf ( ―I is greater than 5\n‖);
if (I < 5) printf (―I = %d / n‖, I); else printf (―I is greater than 5\n‖);
Example 6-4:
for (i =1; i <10; i + +) printf ( ―i = %d /n‖, i);
Example 6-5:
int I = 1;
do { printf ( ―i = %d /n‖, I); I ++; } while (I < 10);
Example 6-6:
int I = 1;
while (I < 10) { printf ( ―i = %d / n‖, I); I ++; }
Example 6-7:
float cube (float x ); // takes a float and returns a float value
void printxy(float x, float y); // takes 2 floats, doesn‘t return any value
319
Example 6-8:
float cube (float x )
{
float y;
y = x*x*x;
return y;
}
C) Function Call:
If the function returns a value, it may be assigned to another variable:
Examples 6-9:
float x, y;
y = cube (x ); // call cube() function and returns its value to y.
printxy(x, y); // just call print() function.
320
struct employee {
char name [30],
int code ;
float salary;
};
or as follows:
typedef struct {
char name [30] ,
int code;
float salary;
} employee;
After creation of this new data type (structure) you can use it to create
new structure variables as follows:
int main ( )
{
employee Engineer ;
return 0;
}
321
Structure_name.member_name
Example 6-12:
type * pointer_name;
Examples 6-13:
322
Example 6-14:
# include <stdio.h>
# include <string.h>
typedef struct
{
char name [30]; float salary; char code;
} employee;
int main( )
{
int Num, I;
int Code ;
char Name [30] ;
float Salary
employee Engineer [10];
employee *ptr_Engineer;
scanf (―Enter Number of Engineer [<10] %d‖, Num);
ptr_Engineer = Engineer; // point to first Engineer structure
for (I = 0; I <= Num; I++)
{
printf(―Input Data of Engineer Number [%d]\n‖, I);
printf(―1- Input Engineer Name‖); scanf(―%s‖, Name);
printf(―2- Input Engineer Code:‖); scanf(―%d‖, &Code);
printf(―3- Input Engineer Salary:‖); scanf(―%f‖, &Salary);
strcpy (ptr_Engineer.name, Name);
ptr_Engineer->code = Code;
ptr_Engineer->salary = Salary;
ptr_Engineer + +; // point to following employee structure.
}
:
return 0;
}/////////////////////////////////////////////////////////////////////////////////////////////////////
323
B) Output to Screen
The C language provides powerful statements for screen output, such as:
Example 6-16:
Printf() is a formatted output statement. The part of the string that begins
with % in the printf() is called the format specifier. The format for what
appears about a % sign statement is:
The plus sign will include the sign specifier for the number; such that
printf( "%+d\n", 10 );
will print +10. Finally, the minus sign will cause the output to be left-
justified. This is important if you use the width specifier and you want the
padding to appear at the end of the output instead of the beginning. Thus
printf( "|%-5d|%-5d|\n", 1, 2 );
B) Output to a String
Examples 6-17:
int A; char Ch; char *Str;
sscanf(Str, ―%d %c‖, &A, &Ch); // read A and Ch from the string Str
sprintf(Str, ―%d %c‖, &A, &Ch); // write A and Ch in the string Str
326
Example 6-18:
FILE *fp;
If ( fb = fopen ( ―test‖, ―w‖ )) = = NULL )
{ print (―cannot open file‖, exit (1); }
The following table illustrates the available modes for file I/O in C-
language:
Mode Meaning
r Read from a text file ( default )
w Write to a text file ( default )
a Append to a text file ( default )
rb Read from a binary file
wb Write to a binary file
ab Append to a binary file
r+ Open a text file for read/write
w+ Create a text file for read/write
a+ Append a text file for read/write
r+b Open a binary file for read/write
w+b Create a binary file for read/write
a+b Append a binary file for read/write
Example 6-19:
Assume fp is a pointer to a FILE structure (a stream). In order to read
(get) characters from the file stream, you may use the fgetc() function as
follows:
char Ch;
do { Ch = fgetc(fp); } while (Ch !=EOF);
327
C) Output to a File
Example 6-20:
Assume fp is a pointer to a file stream. You can use feof() to detect the
end of file, as follows.
char *str;
while ( ! feof (fp)) str = fgets (fp);
Origin Meaning
SEEK_SET Beginning of file
SEEK_CUR Current position
SEEK_END End of file
Example 6-22:
The following routine will print all strings, which are separated by 128
byte from myfile
FILE *fp;
fp = fopen(myfile, ‗r‘);
while ( ! feof (fp))
{ fseek (fp, 128, SEEK_CUR ); str = fgets (fp); puts(str); }
328
However, you can direct input/output from/to standard devices using the
redirection characters (< , >) in the command line.
Examples 6-22:
# include <stdio.h>
# include ―myfile.h‖
Examples 6-23:
# define TRUE 1
# defile FALSE 0
# define min (a,b) (a) < (b) ? (a) : (b)
# define min (a,b) (a) > (b) ? (a) : (b)
#pragma asm
:
#pragma endasm
#include <stdio.h>
#include <time.h>
int RDTSC (void) ; // Read Real-Time system Clock
volatile time_t t;
#pragma aux RDTSC = ".586" "rdtsc" modify [eax edx] value [eax];
330
Encapsulation
Inheritance
Polymorphism
Encapsulation is the mechanism that binds code and the data together,
and keeps both safe from outside interference and misuse. One way to
think about encapsulation is as a protective wrapper that prevents the
code and data from being arbitrarily accessed by other code defined
outside the wrapper. Access to the code and data inside the wrapper is
tightly controlled through a well-defined interface. Conclusion: The
wrapping up of data and methods into a single unit (called class) is
known as encapsulation.
331
Example 6-25
class Point {
int _x, _y; // member variables (point coordinates)
public: // member functions (methods)
void setX (const int val);
void setY (const int val);
int getX() { return _x; }
int getY() { return _y; }
};
The class data members are sometimes called the class variables and the
class member functions are sometimes called methods. The class data
members (variables) and member functions (methods) may be classified
by 3 modifiers; public, protected and private. The public members can
be manipulated everywhere in the program, without restriction. However,
private members can only be manipulated by the other class functions. If
not specified, class members are private by default. Member functions
(methods) have full access to all data members of the class. They may be
defined inside the class (inline definition), as shown above, or outside the
class (deported definition), as follows:
float Point::getX()
{
return _x;
}
333
All C++ classes have one or more special member functions, called
―constructors,‖ that are called to initialize objects. If you don‘t specify a
constructor function in your class definition, the compiler generates a
default constructor with no arguments.
Instances of classes are called objects. An object of a certain class is just
an instance (variable) of this class type. For example, you can declare an
object of the Point class and call its members as follows:
You can also create pointers to certain class objects and arrays of objects
in much the same manner as you do with structures.
6-4-3. Class Constructors and Destructors
Constructors are methods which are used to initialize an object at its
definition time. We extend our class Point such that it initializes a point
to coordinates (0, 0):
class Point {
int _x, _y;
public:
Point() { _x = _y = 0; } // constructor
~Point() { } // destructor
void setX(const int val);
void setY(const int val);
int getX() { return _x; }
int getY() { return _y; }
};
Constructors have the same name of the class. They have no return
value. Like other functions, constructors can take arguments.
334
When we leave the scope of the definition of the Point object, we must
ensure that the allocated memory is released. We therefore define a
special method called destructor, which is called for each object at its
destruction time. Destructors are declared similar to constructors. They
also use the name of the defining class prefixed by a tilde (~).
6-4-4. Specific Operators in C++
The C++ has some specific operators like this, new, and delete. The new
operator is used for dynamic memory allocation. It returns a pointer to the
allocated memory and delete is used to destroy this pointer.
6-4-5. Input / Output in C++
The C++ has a distinct I/O library, whose functions are available through
the iostream class. When you include iostream.h in your file, you can use
cin and cout for console input/output. Thus, you can use the input stream
cin, to input data to the standard console as follows:
int A; cin >> A;
int A, B; cin >> A >> B ;
Note the use of the >> operator to input data to an input stream. Also,
you can use cout to output data to the standard console as follows:
int A; cout << A;
int A, B; cout << A << B ;
Note the use of the << operator to output data to am output stream. You
can also open and close files for different modes using the derived classes
ifstream, ofstream and their associated functions, as follows:
#include <iostream.h>
int main()
{
ifstream fin; // fin is an ifstream object
fin.open (filename , ―mode‖);
:
fin.close();
:
ofstream fout; // fout is an ofstream object
fout.open (filename , ―mode‖);
:
fout.close();
:
return 0;
}
335
The following Table depicts the available modes for file I/O in C++:
Table 6-11. Standard I/O streams in C++ language.
Mode Meaning
ios::create Create a new file
ios::app Append
336
The member functions associated with setting get and put pointers are:
seekg() and seekp() for dragging the get and put pointer, to the specified
position. Both seek methods take an argument (streampos) relative to the
beginning of the file (using ios::beg), the end of the file (ios::end), or the
current position (ios::cur). tellg() and tellp() provide the current location
of the get and put pointers, The following lines clear up most questions:
Example 6-26:
seekg(0); seekg(0,ios::beg); //sets the get pointer to the beginning.
seekg(5,ios::beg); //sets get pointer to 5 chars forward of the beginning.
tellp(); tellg() //returns the current value of the put/get pointer
seekp(-10,ios::end); //sets the put pointer to 10 chars before the end
seekp(1,ios::cur); //proceeds to next char
i. Types of Inheritance
You might notice the keyword public used in the first line of the class
definition (its signature). This is necessary because C++ distinguishes
two types of inheritance: public and private. By default, classes are
privately derived from each other. Consequently, we must explicitly tell
the compiler to use public inheritance. The type of inheritance influences
the access rights to elements of the various superclasses. Using public
inheritance, everything which is declared private in a superclass remains
private in the subclass. Similarly, everything which is public remains
public. When using private inheritance the things are quite different as is
shown in the following table.
337
The leftmost column lists possible access rights for elements of classes. It
also includes a third type protected. This type is used for elements which
are directly usable in subclasses but are not accessible from the outside.
The second and third column show the access right of the elements of a
superclass when the subclass is private and public derived, respectively.
ii- Inherited Class Construction
When we create an instance of class Point3D its constructor is called.
Since Point3D is derived from Point the constructor of class Point is also
called. However, this constructor is called before the body of the
constructor of class Point3D is executed. In general, prior to the
execution of a particular constructor body, constructors of all superclasses
are called to initialize their part of the created object. For instance
This dynamic initialization can also be used with built-in data types. For
example, the constructors of class Point could be written as:
Point() : _x(0), _y(0) {}
Point(const int x, const int y) : _x(x), _y(y) {}
339
class DrawableObject {
public:
virtual void print(); //
};
The virtual method print() will be overloaded and defined later in derived
classes. For instance, the derived class Point can define print() as follows:
Any other function, like display() which is able to display any kind of
DrawableObject, can then call the function print(), as follows:
class Color {
public:
virtual ~Color();
};
class Red : public Color {
public:
~Red(); // Virtuality inherited from Color
};
class LightRed : public Red {
public:
~LightRed();
};
Color *palette[3];
palette[0] = new Red; // Dynamically create a new Red object
palette[1] = new LightRed;
palette[2] = new Color;
The newly introduced operator new creates a new object of the specified
type in dynamic memory and returns a pointer to it. Thus, the first new
returns a pointer to an allocated object of class Red and assigns it to the
first element of array palette. The elements of palette are pointers to
Color and, because Red is-a Color the assignment is valid. The operator
delete explicitly destroys an object referenced by a pointer. If we apply
delete to the elements of palette the following destructor calls happen:
delete palette[0];
// Call destructor ~Red() followed by ~Color()
delete palette[1];
// Call ~LightRed(), ~Red() and ~Color()
delete palette[2];
// Call ~Color()
The various destructor calls only happen, because of the use of virtual
destructors. If we did not declared them virtual, each delete would have
only called ~ Color() (because palette[i] is of type pointer to Color).
341
class DrawableObject {
...
public:
...
virtual void print() = 0;
};
This class definition would force every derived class from which objects
should be created to define a method print(). These method declarations
are also called pure methods. Pure methods must also be declared virtual,
because we only want to use objects from derived classes. Classes which
define pure methods are called abstract classes.
6.4 9. Operator Overloading
If we recall the abstract data type for complex numbers, Complex, we can
create a C++ class as follows:
class Complex {
double _real, _imag;
public:
Complex() : _real(0), _imag(0) {}
Complex(const float real, const float imag) : _real(real), _imag(imag) {}
Complex add(const Complex op);
Complex mul(const Complex op);
...
};
Here we assign c the sum of a and b. What we should rather use is the "+''
operator to express addition of two complex numbers. Fortunately, C++
allows us to overload almost all of its operators for newly created types.
342
For example, we could define a ``+'' operator for our class Complex as
follows:
class Complex {
...
public:
...
Complex operator +(const Complex &op) {
double real = _real + op._real,
double imag = _imag + op._imag;
return(Complex(real, imag));
}
...
};
c = a + b;
c = a.operator + (b);
Thus, the binary operator '+' only needs one argument. The first argument
is implicitly provided by the invoking object (in this case a). However,
an operator call can also be interpreted as a usual function call,
class Complex {
public:
double real() { return _real; }
double imag() { return _imag; } // No need to define operator here! };
Complex operator +(Complex &op1, Complex &op2) {
double real = op1.real() + op2.real(),
imag = op1.imag() + op2.imag();
return(Complex(real, imag));
}
343
In this case we must define access methods for the real and imaginary
parts because the operator is defined outside of the class's scope.
However, the operator is so closely related to the class, that it would
make sense to allow the operator to access the private members. This can
be done by declaring it as a friend of class Complex.
You should not use friends very often because they break the data
abstraction principle. If you have to use friends very often it is always a
sign that it is time to restructure your inheritance graph.
In the first line we introduce the keyword template which starts every
template declaration. The arguments of a template are enclosed in angle
brackets. Each argument specifies a placeholder in the following class
definition. In our example, we want class List to be defined for various
data types. One could say, that we want to define a class of lists. In this
case the class of lists is defined by the type of objects they contain. We
use the name T for the placeholder. We now use T at any place where the
type of the actual objects is expected. For example, each list provides a
method to append an element to it. We can now define this method with
T. An actual list definition must now specify the type of the list. If we
stick to the class expression, we have to create a class instance. From this
class instance we can then create ``real'' object instances:
List<int> integerList;
Here we create a class instance of a List which takes integers as its data
elements. We specify the type enclosed in angle brackets. The compiler
applies the provided argument ``int'' and generates a class definition
where the placeholder T is replaced by int, for example, it generates the
following method declaration for append():
Templates can take more than one argument to provide more place
holders. For example, to declare a dictionary class which provides access
to its data elements by a key, one can think of the following declaration:
345
#include <iostream.h>
enum Bool {false, true};
template <class Type> class List; // forward declaration
template <class Type>
class ListElem {
public:
ListElem (const Type elem) : val(elem) {prev = next = 0;}
Type& Value (void) {return val;}
ListElem * Prev (void) {return prev;}
ListElem * Next (void) {return next;}
friend class List<Type>; // one-to-one friendship
protected:
Type val; // the element value
ListElem *prev; // previous element in the list
ListElem *next; // next element in the list
}; //---------------------------------------------------------
346
extern "C" {
#include <stdlib.h>
}
extern "C" {
some_c_function();
}
6-4.14. C++11
C++11 is the new standard of C++. It fixes many bugs and add
many language features, such as the auto keyword and lambda
(inline) expressions. In C++11, you don't need to provide the type of a
variable if the compiler can determine its type from its initialization. For
example, you can write a piece of code like this:
int x = 3;
auto y = x;
348
WinMain( )
Windows messages { Exit()
MessageLoop WndProc( )
}
Fig. 6-5. Block diagram of a typical Windows application program and its interaction
with Windows via messages.
If you don't want to wait, you can use PeekMessage instead. This
function returns immediately. Therefore it also returns whether a message
has arrived at all. If the return value of GetMessage equals 0, WM_QUIT
has occurred. Using DispatchMessage you forward the message to the
window procedure by means of the OS.
Resources are data which are linked in the program file. This feature is
used to include icons, menus, and multiple language support. In order to
create resources, you need a resource script (*.RC). It describes the
resources to be linked in your *.exe file. You can create resource scripts
using a text editor or a resource editor. It's compiled together with the
data to a *.res or *.obj file, which then gets passed to the linker.
To use Win32 functions, you have to include the required *.lib files in
the program. While TASM stores all functions in import32.lib, MASM
has a separate LIB for every DLL. That means that if you use MASM,
you have to check what DLL contains the function you need.
350
351
__asm
{
MOV AL, 2
MOV DX, 0xD007
OUT AL, DX
}
Because the __asm keyword is a statement separator, you can also put
assembly instructions on the same line:
__asm MOV AL, 2 __asm MOV DX, 0xD007 __asm out AL, DX
All the three examples generate the same code, but the first style
(enclosing the __asm block in braces) has some advantages. The braces
clearly separate assembly code from C/C++ code and avoid needless
repetition of the __asm keyword. Braces can also prevent ambiguities. If
you want to put a C/C++ statement on the same line as an __asm block,
you must enclose the block in braces. Without braces, the compiler
cannot tell where assembly code stops. Finally, because the text in braces
has the same format as MASM text, you can cut and paste text from
existing MASM source files.
352
Unlike braces in C/C++, the braces enclosing an __asm block don‘t affect
the variable scope.
struct first_type {
char *wawa; int same_name;
};
353
struct second_type
{
int waq; long same_name;
};
All references to the member same_name must use the variable name
because same_name is not unique. But the member weasel has a
unique name, so you can refer to it using only its member name:
__asm
{
MOV EBX, OFFSET hal
MOV ECX, [EBX] hal.same_name ; Must use 'hal'
MOV ESI, [EBX].weasel ; Can omit 'hal'
}
Note that omitting the variable name is merely a coding convenience. The
same assembly instructions are generated whether or not the variable
name is present. You can access data members in C++ without regard to
access restrictions. However, you cannot call member functions.
6-6.4. Writing Functions with Inline Assembly
If you write a function with inline assembly code, it‘s easy to pass
arguments to the function and return a value from it. The following
examples compare a function first written for a separate assembler and
then rewritten for the inline assembler. The function, called power2,
receives two parameters, multiplying the first parameter by 2 to the power
of the second parameter. Written for a separate assembler, the function
might look like this:
; POWER.ASM
; Compute the power of an integer
PUBLIC _power2
_TEXT SEGMENT WORD PUBLIC 'CODE'
_power2 PROC
PUSH EBP ; Save EBP
MOV EBP, ESP ; Move ESP into EBP so we can
; refer to arguments on the stack
354
_power2 ENDP
_TEXT ENDS
END
// POWER2.C
#include <stdio.h>
355
You can use #pragma warning to disable the generation of this warning.
Labels defined in __asm blocks are not case sensitive; both goto
statements and assembly instructions can refer to those labels without
regard to case. C and C++ labels are case sensitive only when used by
goto statements. Assembly instructions can jump to a C or C++ label
without regard to case. The following code shows all the permutations:
356
Because exit is the name of a C library function, this code might cause a
jump to the exit function instead of to the desired location. As in MASM
programs, the dollar symbol ($) serves as the current location counter. It
is a label for the instruction currently being assembled. The main use of
__asm blocks is to make long conditional jumps:
#include <stdio.h>
char format[] = "%s %s\n";
char hello[] = "Hello";
char world[] = "WORLD";
void main( void )
{
__asm
{
MOV EAX, OFFSET world
PUSH EAX
MOV EAX, OFFSET hello
PUSH EAX
MOV EAX, OFFSET format
PUSH EAX
CALL printf
//clean up the stack so that main can exit cleanly
//use the unused register EBX to do the cleanup
357
POP EBX
POP EBX
POP EBX
}
}
Because function arguments are passed on the stack, you simply push the
needed arguments—string pointers, in the previous example—before
calling the function. The arguments are pushed in reverse order, so they
come off the stack in the desired order. To emulate the C statement this
example pushes pointers to world, hello, and format, in that order, and
then calls printf.
return 0;
}
358
Now consider the following example, which reads one character from the
keyboard and displays it on the screen, if it is between ‗0‘ and ‗9‘.
In this program we make use of INT 21, to call various DOS functions.
For instance, the keyboard input function is called by loading the
accumulator high byte, AH, with 8H and then calling INT 21. Also, the
video output function is called by loading AH with 2H and then calling
INT 21 again. Note that if the input character is below 0 or above 9, the
assembly routine invokes conditional jump instructions (JB, which means
jump if below, and JA, which means jump if above) to transfer control to
an external location (the label CORNER) outside the assembly block.
It should be noted that using DOS functions calls (by INT 21) is very
difficult in Windows 32-bit applications. So, if you‘d like to make data
input/output from/to console in a 32-bit Windows application, use the
console functions _getch() to input characters (bytes) or _putch() to
display characters (bytes).
359
360
This tells the compiler that a field named "gear" exists, holds numerical
data, and has an initial value of "1". Other examples are as follows:
In addition to the eight primitive data types listed above, the Java
programming language also provides special support for character strings
via the java.lang.String class. Enclosing your character string within
double quotes will automatically create a new String object; for
example, String s = "this is a string"; String objects
are immutable, which means that once created, their values cannot be
changed. The String class is not technically a primitive data type, but
you may think of it as such.
361
You can use enum types when you need to represent a fixed set of
constants. This includes natural enum types such as the solar system
planets, the choices on a menu and data sets where you know all possible
values at compile time.
You may also place the square brackets after the array name:
Another way to create an array is by the new operator. The next statement
allocates an array with ten integer elements and assigns the array to the
myArray variable.
362
int cadence = 0;
The following table summarizes all the Java operators and their
precedence.
Table 6-15. Java operators
Operators Precedence
postfix expr++ expr--
unary ++expr --expr +expr -expr ~ !
multiplicative */%
additive +-
shift << >> >>>
relational < > <= >= instanceof
equality == !=
bitwise AND &
bitwise exclusive OR ^
bitwise inclusive OR |
logical AND &&
logical OR ||
ternary ?:
assignment = += -= *= /= %= &= ^= |= <<= >>= >>>=
364
int cadence = 0 ;
myArray[0] = 100 ;
Int j = 1 * 2 * 3
System.out.println("Condition is true.");
} // end block one
else
{ // begin block 2
System.out.println("Condition is false.");
} // end block 2
}
}
void applyBrakes() {
if (isMoving) { // the "if" clause: bicycle must moving
currentSpeed-- ; // the "then" clause: decrease current speed }
}
else
{
System.err.println("The bicycle has already stopped!");
}
}
class SwitchDemo
{
public static void main(String[] args)
{
int month = 8;
switch (month) {
case 1: System.out.println("January"); break;
case 2: System.out.println("February"); break;
case 3: System.out.println("March"); break;
case 4: System.out.println("April"); break;
case 5: System.out.println("May"); break;
case 6: System.out.println("June"); break;
case 7: System.out.println("July"); break;
case 8: System.out.println("August"); break;
case 9: System.out.println("September"); break;
case 10: System.out.println("October"); break;
case 11: System.out.println("November"); break;
case 12: System.out.println("December"); break;
default: System.out.println("Invalid month.");break;
}
}
}
367
while (expression)
{
statement(s)
}
while (true)
{
// your code goes here
}
do
{
statement(s)
} while (expression);
The difference between do-while and while is that do-while evaluates its
expression at the bottom of the loop instead of the top. Therefore, the
statements within the do block are executed at least once
When using this version of the for statement, keep in mind that:
The initialization expression initializes the loop.
The loop terminates When the termination expression is FALSE.
The increment expression is invoked after each iteration of the loop:
368
class ForDemo {
public static void main(String[] args)
{
for(int i=1; i<11; i++)
{
System.out.println("Count is: " + i);
}
}
}
The three expressions of the for loop are optional; an infinite loop can
be created as follows:
class BreakDemo {
public static void main(String[] args) {
int[] arrayOfInts = { 32, 87, 3, 589, 12, 1076, 2000, 8, 622, 127 };
int i; int searchfor = 12;
boolean foundIt = false;
for (i = 0; i < arrayOfInts.length; i++)
{
if (arrayOfInts[i] == searchfor) { foundIt = true; break; }
}
if (foundIt) {
System.out.println("Found " + searchfor+ " at index " + i);
} else
{ System.out.println(searchfor + " not in the array");
}
}
}
369
return ++count;
i. Classes in Java
Here is sample code for a possible implementation of a Bicycle class, to
give you an overview of a class declaration. For the moment, don't
concern yourself with the details.
class MyClass {
//field, constructor, and method declarations
}
This is a class declaration. The class body (between the braces) contains
all the necessary code of the created objects from the class: constructors
for initializing new objects, declarations for the fields and its objects, and
methods to implement the behavior of the class and its objects. In general,
class declarations can include these components, in order:
The fields of Bicycle are named cadence, gear, and speed and are all of
type integer. The public keyword identifies these fields as public
members, accessible by any object that can access the class.
371
The basic elements of a method declaration are the method name, return
type, parentheses, and a body between braces, {}. Generally, method
declarations have six components, in order:
1. Modifiers—such as public, private and protected.
2. The return type—the data type of the value returned by the method,
3. The method name.
4. The parameter list in parenthesis.
5. An exception list.
6. The method code body, enclosed between braces.
372
Although a method (or function) name can be any legal identifier, code
conventions restrict method names. By convention, method names should
be a verb in lowercase or a multi-word name that begins with a verb in
lowercase, followed by adjectives, nouns, etc. In multi-word names, the
first letter of each of the second and following words should be
capitalized. Here are some examples:
runFast
getBackground
getFinalData
setX
isEmpty
Overloaded methods are differentiated by the number and the type of the
arguments passed into the method. In the code sample, draw(String s) and
draw(int i) are distinct methods because they require different argument
types. You cannot declare more than one method with the same name
and the same number and type of arguments, because the compiler cannot
differentiate between them. The compiler does not consider return type
when differentiating methods.
373
This creates space in memory for the object and initializes its fields.
Although Bicycle only has one constructor, it could have others, including
a no-argument constructor:
You cannot write two identical constructors that have the same number
and type of arguments for the same class, because the compiler won‘t be
able to tell them apart. It is not obligatory to provide a constructor for
your class, but you should be careful when doing this. The compiler
automatically provides a no-argument, default constructor for any class
without constructors. This default constructor will call the no-argument
constructor of the parent superclass.
Note that the method, corners is treated like an array. The method can be
called either with an array or with a sequence of arguments. The method
code will treat parameter as an array in all cases. You will most
commonly see varargs with the printing methods; for example, the printf
method, which allows you to print an arbitrary number of objects. It can
be called as follows:
Inside the method, circle initially refers to myCircle. The method changes
the x and y coordinates of the object that circle references (i.e., myCircle)
by 23 and 56, respectively. These changes will persist when the method
returns. Then circle is assigned a reference to a new Circle object with
x=y=0. This reassignment has no permanence, because the reference was
passed in by value and cannot change. Within the method, the object
pointed to by circle has changed, but, when the method returns, myCircle
still references the same Circle object as before the method was called.
6-7.10. Java Objects
In a typical Java program, you create many objects, which interact by
invoking methods.
377
Through object interactions, a program can carry out various tasks, such
as sending and receiving information over a network. Once an object has
completed its work, its memory resources should be recycled for use by
other objects. Here's a small program that creates three objects: one Point
object and two Rectangle objects. The program displays information
about various objects.
The following sections use the above example to describe the life cycle of
an object within a program. From them, you will learn how to write code
that creates and uses objects in your own programs. You will also learn
how the system cleans up after an object when its life has ended.
378
i. Creating Objects
As you know, a class provides the blueprint for objects; you create an
object from a class. Each of the following statements taken from the
CreateObjectDemo program creates an object and assigns it to a variable:
The first line creates an object of the Point class, and the second and third
lines each create an object of the Rectangle class. Each of these
statements has three parts:
Declaration: The code set in bold are all variable declarations that
associate a variable name with an object type.
Instantiation: The new keyword is a Java operator that creates the object.
Initialization: The new operator is followed by a call to a constructor,
which initializes the new object.
type name;
This notifies the compiler that you will use name to refer to data whose
type is type. With a primitive variable, this declaration also reserves the
proper amount of memory for the variable. You can also declare a
reference variable on its own line. For example: Point originOne; If you
declare originOne like this, its value will be undetermined until an object
is actually created and assigned to it. Simply declaring a reference
variable does not create an object. For that, you need to use the new
operator, as described in the next section. You must assign an object to
originOne before you use it in your code. Otherwise, you will get a
compiler error. A variable in this state, which currently references no
object, can be illustrated as follows (the variable name, originOne, plus a
reference pointing to nothing):
iii. Instantiating a Class
The new operator instantiates a class by allocating memory for a new
object and returning a reference to that memory. The new operator also
invokes the object constructor. The new operator requires a single, postfix
argument: a call to a constructor. The name of the constructor provides
the name of the class to instantiate.
379
The result of executing this statement can be illustrated in the next figure:
Here's the code for the Rectangle class, which contains four constructors:
380
// four constructors
public Rectangle() {
origin = new Point(0, 0); }
public Rectangle(Point p) {
origin = p; }
public Rectangle(int w, int h) {
origin = new Point(0, 0); width = w; height = h; }
public Rectangle(Point p, int w, int h) {
origin = p; width = w; height = h; }
// a method for moving the rectangle
public void move(int x, int y) {
origin.x = x; origin.y = y; }
// a method for computing the area of the rectangle
public int getArea() { return width * height; }
}
Each constructor lets you provide initial values for the rectangle's size
and width, using both primitive and reference types. If a class has
multiple constructors, they must have different signatures. The Java
compiler differentiates the constructors based on the number and the type
of the arguments.
When the Java compiler encounters the following code, it calls the
constructor in the Rectangle class that requires a Point argument followed
by 2 integer arguments:
All classes have at least one constructor. If a class does not explicitly
declare any, the Java compiler automatically provides a no-argument
constructor, called the default constructor. This default constructor calls
the class parent's no-argument constructor, or the Object constructor if the
class has no other parent. If the parent has no constructor (Object does
have one), the compiler will reject the program.
or
objectReference.methodName();
The Rectangle class has two methods: getArea() to compute the rectangle
area and move() to change the rectangle's origin.
A nested class is a member of its enclosing class and has access to other
members of the enclosing class, even if they were private. As a member
of OuterClass, a nested class can be declared private, public, protected, or
package private. Recall that outer classes can only be declared public or
package private.
6-7.11. Interfaces in Java
There are a number of situations in software engineering where each team
should be able to write their code without any knowledge of how the
other group's code is written. Generally speaking, interfaces are such
protocols between different pieces of software.
Note that the method prototyping (signature) have no braces and are
terminated with a semicolon. To use an interface, you write a class that
implements the interface. When an instantiable class implements an
interface, it provides a method body for each of the methods declared in
the interface. For example,
The public access specifier indicates that the interface can be used by any
class in any package. If you do not specify that the interface is public, it
will be accessible only to classes defined in the same package. An
interface can extend other interfaces, just as a class can derive from other
classes. The interface declaration includes a comma-separated list of all
the interfaces that it extends.
385
386
Object is the most general of all classes. Classes near the bottom of the
hierarchy provide more specialized behavior.
387
388
The class MountainBike inherits all the fields and methods of Bicycle and
adds the field seatHeight and a method to set it. Except for the
constructor, it is as if you had written a new MountainBike class from
scratch, with 4 fields and 5 methods.
You can use the inherited members, replace them, hide them, or
supplement them with new members:
The inherited fields can be used directly, just like any other fields.
You can declare a field in the subclass with the same name as the
one in the superclass, thus hiding it (not recommended).
You can declare new fields in the subclass that are not in the
superclass.
The inherited methods can be used directly as they are.
You can write a new instance method in the subclass that has the
same signature as the one in the superclass, thus overriding it.
You can write a new static method in the subclass that has the
same signature as the one in the superclass, thus hiding it.
You can declare new methods in the subclass that are not in the
superclass.
You can write a subclass constructor that invokes the constructor of
the superclass, either implicitly or by using the keyword super.
The following sections in this lesson will expand on these topics.
v- Casting Objects
We have seen that an object is of the data type of the class from which it
was instantiated. For example, if we write
Casting shows the use of an object of one type in place of another type,
among the objects permitted by inheritance and implementations. For
example, if we write
390
then obj is both an Object and a Mountainbike (until such time as obj is
assigned another object that is not a Mountainbike). This is called implicit
casting. If, on the other hand, we write
Note: You can make a logical test as to the type of a particular object
using the instanceof operator. This can save you from a runtime error
owing to an improper cast. For example:
391
The Cat class overrides the instance method in Animal and hides the class
method in Animal. The main method in this class creates an instance of
Cat and calls testClassMethod() on the class and testInstanceMethod() on
the instance. The output from this program is as follows:
The version of the hidden method that gets invoked is the one in the
superclass, and the version of the overridden method that gets invoked is
the one in the subclass.
392
viii- Modifiers
The access specifier for an overriding method can allow more, but not
less, access than the overridden method. For example, a protected
instance method in the superclass can be made public, but not private, in
the subclass.
393
Printed in Superclass.
Printed in Subclass
394
The methods inherited from Object that are discussed in this section are:
The notify, notifyAll, and wait methods of Object all play a part in
synchronizing the activities of independently running threads in a
program, which is discussed in a later lesson and won't be covered here.
CloneableObject.clone();
395
when you are going to write a clone() method to override the one in
Object. If the object on which clone() was invoked does implement the
Cloneable interface, Object's implementation of the clone() method
creates an object of the same class as the original object and initializes the
new object's member variables to have the same values as the original
object's corresponding member variables.
396
Consider this code that tests two instances of the Book class for equality:
This program displays objects are equal even though firstBook and
secondBook reference two distinct objects. They are considered equal
because the objects compared contain the same ISBN number. You
should always override the equals() method if the identity operator is not
appropriate for your class. Note that if you override equals(), you must
override hashCode() as well.
397
The Class class, in the java.lang package, has a large number of methods
(more than 50). For example, you can test to see if the class is an interface
isInterface(), an annotation isAnnotation(), or an enumeration isEnum().
You can see what the object's fields are getFields() or what its methods
are getMethods(), and so on.
System.out.println(firstBook.toString());
class ChessAlgorithm {
enum ChessPlayer { WHITE, BLACK }
...
final ChessPlayer getFirstPlayer() { return ChessPlayer.WHITE; }
...
}
System.out.format(.....);
where format is a string that specifies the formatting to be used and args
is a list of the variables to be printed using that formatting. A simple
example would be
399
The first parameter, format, is a format string specifying how the objects
in the second parameter, args, are to be formatted. The format string
contains plain text as well as format specifiers, which are special
characters that format the arguments of Object... args. Here, the notation
Object... args is called varargs, which means that the number of
arguments may vary.
Format specifiers begin with a percent sign (%) and end with a converter.
The converter is a character indicating the type of argument to be
formatted. Between the percent sign (%) and the converter you can have
optional flags and specifiers. There are many converters, flags, and
specifiers, which are documented in java.util.Formatter. Here is an
example:
int i = 461012;
System.out.format("The value of i is: %d%n", i);
The printf() and format() methods are overloaded. Each has a version
with the following syntax:
400
401
In this case, class X must be abstract because it does not fully implement
Y, but class XX does, in fact, implement Y. An abstract class may have
static fields and static methods. You can use these static members with a
class reference—for example, AbstractClass.staticMethod()—as you
would with any other class.
404
405
// Hello World in C
#include <stdio.h>
main()
{
printf("Hello World\n");
}
406
// Hello World in C#
using System;
class HelloWorld
{
static void Main()
{
Console.WriteLine("Hello World");
}
}
; Hello World program in Intel x86 assembly under DOS, ( using MASM)
.MODEL tiny
.CODE
ORG 100H
HELLO PROC
MOV AH, 09h
LEA DX, msg
INT 21h ; Display Hello World
MOV AX, 4C00h ; Exit to DOS
INT 21h
HELLO ENDP
msg DB 'Hello World$'
END
In the following section, we'll see how to write fast and efficient
assembly routines within the C/C++ and Java programs.
407
408
The first instruction loads the contents of JNIEnv into ebx and the second
loads the contents of the address pointed to by ebx into eax. Since the
content of ebx is the same as that of JNIEnv, eax now has the content of
the location pointed to by JNIEnv. This means eax now contains the
starting address of the function table.
Next, we need to retrieve the contents of the entry in the function table
that corresponds to the function we want to call. To do this, we have to
multiply the zero based index of the function by four--since each pointer
is four bytes long--and add the result to the starting address of the
function table which we have formed in eax earlier. We do it as follows:
mov ebx, eax ; save pointer to function table
mov eax, index ; move the value of index into eax
mov ecx, 4
mul ecx ; multiply index by 4
add ebx, eax ; ebx points to the desired entry
mov eax, [ebx] ; eax points to the desired function
409
The content of eax can now be used to call the function. This scheme of
accessing JNI interface functions is shown in figure, below.
Example 6-31
In order to see how the JNI technique can be used to call an assembly
language program, let us consider a simple example. In our example a
Java class (ShowMessage) calls assembly language code to display a
Windows message box. If the message box is displayed, then the
assembly language code returns a string to tell the calling class that it was
successful. Otherwise, an error message is returned. In either case, the
calling class prints the returned string on the console. The Java class
looks like this:
class ShowMessage
{
public native String HelloDLL(String s);
static
{
System.loadLibrary("hjwdll");
}
public static void main(String[] args)
{
ShowMessage sm = new ShowMessage();
String returnMessage = sm.HelloDll("Hello, World of JNI");
System.out.println(returnMessage);
}
}
Those familiar with JNI will notice that the Java class is identical to what
it would have been if the called native method had been written in C or
C++, which, of course, is as it should be, since the calling method need
not be aware of the language used to write the called method. All that
matters to the Java code is that it is calling a native method as declared in
the third line of the code:
We will not go into the structure of the Java class—you may find it in
other books (as simple as C++). It is the assembly language code that is
of interest to us here, and we shall examine it in some detail.
410
.386
.model flat,stdcall
option casemap:none
include <pathname>\include\windows.inc
include <pathname>\include\user32.inc
include <pathname>\include\kernel32.inc
includelib <pathname>\lib\user32.lib
includelib <pathname>\lib\kernel32.lib
Java_ShowMessage_HelloDll PROTO :DWORD, :DWORD, :DWORD
; This macro returns pointer to the function table in fnTblPtr
GetFnTblPtr MACRO envPtr, fnTblPtr
mov ebx, envPtr
mov eax, [ebx]
mov fnTblPtr, eax
ENDM
; This macro returns pointer to desired function in fnPtr.
GetFnPtr MACRO fnTblPtr, index, fnPtr
mov eax, index
mov ebx, 4
mul ebx
mov ebx, fnTblPtr
add ebx, eax
mov eax, [ebx]
mov fnPtr, eax
ENDM
.data
Caption db "JAV_ASM",0
ErrorMsg db "String conversion error",0
SccsMsg db "MessageBox displayed",0
.code
hwEntry proc hInstance:HINSTANCE, reason:DWORD,
reserved1:DWORD
mov eax, TRUE
ret
hwEntry endp
411
412
The mangled name can be derived manually by using the algorithm used
by JNI or generated automatically by running javah on
ShowMessage. You can do this by typing:
at the command line. The resulting file will be ShowMessage.h and will
show the mangled name. If you do use the javah approach, do not
include the output file in the assembly code. The only thing to be used is
the mangled name.
The HelloDll procedure first gets the pointer to the function table. It
then gets the pointer to the GetStringUTFChars function to convert
the String object passed by the Java method into a UTF8 string that
can be handled by assembly language. The parameters required for
calling GetStringUTFChars are then pushed onto the stack. Note
that the right-most parameter is pushed first in accordance with the
stdcall convention followed by JNI. The function puts its return value
in eax. If this value is NULL, then there was an error. Otherwise, a valid
pointer to a UTF8 string is available in eax, which can be used to display
the message passed by the Java method. After the message is displayed,
the UTF8 string should be released as shown.
413
The native method returns one of two strings to the calling method
depending on whether it succeeded or failed in displaying the message
passed to it by the Java method. However, the string generated by the
native method has to be converted into a Java String object before
being returned. This is done by a call to NewStringUTF.
Take note of the fact that the pointer to the function table needs to be
derived only once in a thread. That is why it is better to split the pointer
translation process into two parts so that the first part need not be
executed unnecessarily over and over again. Once you have compiled the
ShowMessage class and have created the hjwdll.lib and hjwdll.dll files,
put all the three files in the same folder. Now, if you execute
ShowMessage, you will see a message box like the one in the following
figure 6-9.
414
6-11. Summary
The assembly language is more powerful and faster than all high level
languages. However, in order to write a huge software system, it is more
practical to use a high level language, like C/C++ or Java, and only use
the assembly language when you would like to build efficient I/O
routines. Anything you can do with a C/C++ or Java you can do in assembly
since C/C++ compilers convert the C-source code into machine code. In this
chapter we described how to write assembly language routines within
C/C++ and java programs.
We also explained what you need to know to use the Visual C/C++ inline
assembler with Intel x86-series processors and compatible. Inline
assembly code may be included as a string parameter, one instruction per
line, to the asm function in a C/C++ source program.
C++ Java
More or less backwards compatible Designed without backward
with C source code. compatibility with any previous
language. The syntax is however
influenced by C/C++ to make
transition easy for developers.
Allows direct calls to native system Call through the Java Native
libraries. Interface.
Exposes low-level system facilities. Runs in a protected virtual
machine.
Optional automated bounds Always performs bounds checking.
checking.
Supports native unsigned No native support for unsigned
arithmetic. arithmetic.
No standardized limits or sizes for Standardized limits and sizes of all
any numerical types. Only relative primitive types.
sizes specified.
Parameters passed by value, pointer Parameters always passed by value;
or by reference. however objects are accessed
through references and it is these
references that are passed or
returned by value, not the objects
themselves (comparable in C++)
Explicit memory management, Automatic garbage collection only,
though third party frameworks exist though can be manually tuned by
to provide garbage collection. programmer.
Allows explicitly overriding types. Rigid type safety except for
widening conversions.
The C++ Standard Library has a The standard library has grown
much more limited scope and with each release.
functionality than the Java standard
library
Operator overloading. Meaning of operators is immutable.
Full, multiple inheritance Full single inheritance, multiple
inheritance from interfaces only
416
6-12. Problems
6-1) Choose the most suitable answer for the following questions:
i) What is the correct value to return to the operating system upon the
successful completion of an executable program?
A. Programs do not return a value B. -1
C. 1 D. 0
ii) What is the only function all C++ programs must contain?
A. start() B. system()
C. main() D. program()
iii) What punctuation is used to show the start and end of code blocks?
A. { } B. -> and <-
C. BEGIN and END D. ( and )
iv) What punctuation ends most lines of C++ code?
A. . B. ;
C. : D. '
v) Which of the following is a correct comment in C/C++?
A. */ Comments */ B. ** Comment **
C. /* Comment */ D. { Comment }
vi) Which of the following is not a variable type in C language?
A. float B. real
C. int D. double
vii) Which of the following is the operator to compare 2 variables?
A. := B. =
C. equal D. ==
viii) Which is not a proper prototype?
A. int funct(char x, char y); B. void funct();C
double funct(char x) D. char x();
ix) What purpose do classes serve?
A. data encapsulation
B. providing a convenient way of modeling real-world
objects
C. simplifying code reuse
D. all of the above
x) Which is not a protection level provided by classes in C++?
A. protected B. hidden
C. private D. public
xi) What value must a destructor return?
A. Pointer to the class. B. Object of the class.
C. Status code showing whether the class is destructed
correctly
D. Destructors do not return a value.
417
6-2) Write a C-program that sorts a table of 100 string and arrange them
in alphabetical order, in the same array.
6-6) Write a C-Program that hooks the timer interrupt and turns the
speaker on and off
6-7) Write a C-program that opens a text file named list.txt, for write
mode, using fopen() function. How you can accelerate opening the file by
using Assembly routines instead of the fopen() function.
6-8) Write the output of the following Java program (Welcome.java) and
re-wtite it using the printf() method (instead of println).
// Listing of Welcome.java
public class Welcome3 {
// main method begins execution of Java application
public static void main( String args[ ] )
{
System.out.println( "Welcome\n to \n Java \n Programming!" );
// end method main
} // end class Welcome3
418
1. import javax.swing.JOptionPane;
2. public class Addition {
3. public static void main(String args[ ])
4. {
5. String Number1, Number2;
6. int number1, number2, sum;
7. Number1= JOptionPane.showInputDialog("Enter 1st integer" ;
8. Number2 = JOptionPane.showInputDialog("Enter 2nd integer");
9. number1 = Integer.parseInt( Number1 );
10. number2 = Integer.parseInt( Number2 );
11. sum = number1 + number2;
12. JOptionPane.showMessageDialog("The sum is " sum, "Results
13. null ", JOptionPane.PLAIN_MESSAGE );
14. System.exit( 0 );
15. }
16. }
6-11) Show how to use the JNI technology to replace input/output code
in the above java programs with equivalent Assembly code.
419
6-13. Bibliography
[6] http://www.intel.com
[7] http://www.microsoft.com
420
Memory Interfacing
with Microprocessors
Contents
7-1. Introduction
7-2. Bus Timing of Memory Read/Write Operations
7-2.1. Memory Read Timing
7-2.2. Memory Write Timing
7-2.3. Wait States in 80x86 Microprocessors
7-2.4. Pentium Processor Bus Timing
7-2.5. Bus Cycle Time & Bus Bandwidth of 80x86 Processors
7-3. Memory Address Decoding
7-4. ROM & Its Interface Circuits
7-5. RAM (SRAM, DRAM) & Its Interface Circuits
7-5.1. SRAM Interfacing
7-5.2. Cache Memory and Content Addressable Memory (CAM)
7-5.3. DRAM Interfacing (EDO, SDRAM, DDR, RAMBUS)
7-5.4. DRAM Interfacing with 16-bit Data Bus
7-5.5. DRAM Interfacing with 32-bit Data Bus
7-5.6. DRAM Interfacing with 64-bit Data Bus
7-5.7 DRAM Modules
7-5.8. DRAM Controllers
7-6. Memory Requests
7-7. Checking Memory Errors
7-7.1. Parity Checking
7-7.2. Errors Checking & Correction (ECC)
7.8. Serial Memory Devices
421
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
422
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Memory Interfacing
with Microprocessors
7-1. Introduction
Memory is one of the most important components in microprocessor-
based systems, like computers and embedded control systems. Some
computer basic input/output routines (BIOS) have to be permanently
stored in the computer read-only-memory (ROM). Every time a computer
is started up, programs are loaded from secondary memory (usually hard
disk) into the computer memory. The main memory into which these
programs are loaded is the computer random access memory (RAM).
Therefore, every computer contains several types of memory devices, as
shown in figure 7-1. These memory devices are different in capacity,
speed, and theory of operation. In this chapter we briefly discuss the
various aspects of memory interfacing in computer systems, in general,
and with 80x86 microprocessors, in particular.
423
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Primary storage devices are comparatively faster than all other kinds of
memory types. The most popular example of this kind of memory is the
RAM (Random Access Memory) that we use in modern computers and
PC’s. The following figure depicts the memory interface circuit to the
Intel 8088 microprocessor in the early IBM PC’s.
424
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-3. Memory interface circuit to the Intel 8088 microprocessor in IBM PC.
425
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
T1 T2 T3 T4
CLK
S0,
S1,
S2
A/D Address Data valid for Memory Read
valid
ALE
MEMR
DT/R
DEN
Fig. 7-4(a). Timing diagram of memory read cycle in 8086/8088 microprocessors. All
signals (except for CLK, S0, S1, S2 ) are generated by the 8288 bus controller.
426
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
The DEN signal should be kept high during a minimum setup time
(sometimes called time data valid to write going high or TDVWH).
Again, if the memory is slower than the microprocessor, the
microprocessor will issue wait states until memory becomes ready for
write operations.
T1 T2 T3 T4
CLK
S0,
S1,
S2
A/D Address Data valid for Memory Write
valid
ALE
MEMW
DT/R
DEN
When the memory device is not fast enough, and the READY signal is
low, then the microprocessor generates WAIT states (in addition to the
original 4 clock states), until memory presents its data and sends high
READY signal to the microprocessor.
427
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
1- During T1, the Pentium CPU issues address, ADS, W/R, and M/IO
signals. The MEMR or MEMW (sometimes denoted MRD, MWT) can
be generated from these signals by simple logic.
2- During T2, the data bus is sampled at the positive edge end of T2.
3- Memory wait states are inserted into timing by controlling BRDY
input (to CPU from external memory devices). BRDY should be 0 at the
end of T2, otherwise additional T2 (wait states) are inserted.
T1 T2 T1 T2
CLK
ADDR
ADS
W/R READ
DATA
BRDY
428
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
only 2 clock periods, when there are no wait states. Hence the bus cycle
time for such processors is given by:
For instance, the bus cycle of an 80386 operating at 20MHz, with zero
wait states is given by: Bus Cycle Time (386)= (2+0)*(1/20MHz)=100ns.
And the bus Cycle of an 80486 operating at 50MHz, with 1 wait states is
given by: Bus Cycle Time (486) = (2 + 1)* (1/50 MHz) = 60 ns
The so-called bus speed is given by the inverse of the bus cycle time.
Also the bus bandwidth is given by the product of the bus speed
multiplied by the width of the data bus.
Bus Bandwidth = Bus Speed * Data Bus width (in bytes) (7-4)
Example 7-1:
Calculate the bus speed and the bus bandwidth of a 80486 operating at
50MHz, with zero wait states and transferring data over 32-bit data bus:
Solution:
Bus Cycle Time (486) = (2 + 0)* (1/50 MHz) = 40 n sec
Bus speed = 1/(40 n sec) = 25 MHz
Bus bandwidth = 25 (MHz) x 4 (bytes) = 100 M Byte /sec
However, it should be noted that the bus speed is usually limited by the
external bus type, which is used on the mother board hosting the
microprocessor. For instance, the so called ISA Bus, which is a 16-bit
bus, has a maximum speed of a 8MHz. Also, the so-called EISA bus,
which is a 32-bit bus, supports higher speeds. The most recent PCI bus,
which is a 64-bit bus, admits higher speeds (up to 400MHz).
It should be also noted that the bus speed are measures of the computer
performance, because they express how fast is the communication
between the microprocessor and memory or I/O devices.
Example 7-2:
Calculate the number of wait states, which should be used when a 10 ns
ROM is used with a Pentium operating at 100 MHz, given that the ROM
selection circuits add a delay of 15 ns?
429
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Solution:
The zero bus cycle of a Pentium operating at 100 MHz = 2x10ns = 20ns,
which is shorter than the total time needed to access the ROM (10+15 =
25 ns). So, adding one wait state, will be enough to make the
microprocessor bus cycle slow enough (it will be then 3 x 10ns = 30 ns)
to access the ROM data. So the number of wait states for a 80386
microprocessor may be found using the following inequality:
(2 + W)*10 ≥ 25 or W = 1
Fig. 7-6(a). Memory address decoding of 2 ROM chips, using a simple inverter.
430
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-6(b). Memory address decoding of several memory chips, using a decoder.
431
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
out diagram of the 2716 EPROM chip. The M2716 (from Motorola) is a
16k bit (2k x 8 bit) UV erasable and electrically programmable memory.
A7 1 24 VCC
A6 2 23 A8
A5 3 22 A9
A4 4 21 VPP
A3 5 20 CS
A2 6 19 A10
A1 7 2716 18 PD / PGM
A0 8 EPROM 17 DO7
DO0 9 16 DO6
DO1 10 15 DO5
DO2 11 14 DO4
GND 12 13 DO3
432
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Pin Description
A0-A10 Address lines
CS Chip Select
DO0-DO7 Output lines
PD / PGM Power down / Program
2764 EPROM
A0 A0
A1 A1
.
. .
A12 .
A12
74LS138
A13 I0 Qo CS
A14 I1 Q1
A15 I2 Q2 O0
. O1
Decoder . .
A16 . .
: . .
A19 G2A Q7 CS O7
MEMR G2B
RESET G1
Fig. 7-8. Interfacing 8088 to eight 2764 (8k x 8 bit) EPROM chips.
433
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
RAM
CS WE OE
Control Lines
434
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
There are two basic types of RAM, namely: static RAM (SRAM) and
dynamic RAM (DRAM). Basic memory devices can be fabricated using
different semiconductor technologies, such as the standard CMOS or the
standard bipolar technologies. Both types of RAM are volatile -- they lose
their contents when the power is turned off.
Word Line
A A'
A
A'
Fig. 7-10. Conventional SRAM Cell, with 6 MOSFET transistors (6T cell).
435
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Caches usually make use of random access or even faster access method,
called "associative addressing". Associative memories are also
commonly known as content-addressable memories (CAM). In a CAM
any stored item can be accessed by using the contents of the item in
question. The field chosen to access the CAM is called a KEY. As shown
in figure 7-12(a), CAM has a match output to see which words contain a
key. The CAM unit cell is similar to the (6T) SRAM cell, in addition to 4
match transistors. The items stored in a CAM can be viewed as having
two-field format: KEY and DATA, where KEY is the stored address and
436
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-12(b). Cache memory operation. The KEY and DATA fields of the cache
memory are represented here by xi and value (xi).
Level one cache (L1-Cache) is the highest speed memory in the system
and is most often integrated with the CPU chip itself. The L1-cache is
sometimes divided into 2 parts; namely, the instruction cache (I-Cache)
437
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
and the data cache (D-Cache). However, some CPUs do not have on-chip
cache, and the L1-cache is a high-speed external SRAM tightly coupled
to the CPU, with the capability of operating at near-CPU speeds. High-
speed SRAM that works in this manner is very expensive, so typically a
price versus performance analysis is done to select the most cost-effective
cache configuration for a particular system. Unfortunately, cache is not
usually large enough to contain the entire executable code base, so the
CPU must periodically go off-chip for instructions and data. When the
CPU is forced to make external accesses (to memory or other I/O
devices), then the main memory performance become a critical issue. A
way to solve this problem is to build a two-level (or three-level) caching
system, as shown in figure 7-12(b).
The 80486 and later processors work in this fashion. The first level is on-
chip cache (typically 16kB with 10 ns access time). The next level,
between the on-chip cache and the main memory, is a secondary cache
(L2-Cache) built on the computer system motherboard. A typical L2-
Cache contains from 64kB to 2MB of memory. Common size on PC
systems is 512 kB of cache.
CPU
Cache miss
EU L1-Cache
L2-Cache L3-Cache
RAM
You might ask, "Why bother with a two-level cache? Why not use a
higher capacity SRAM (e.g., 512kB) in one level cache?" Well, the L2-
Cache generally does not operate at zero wait states. The circuitry to
support 512 kB of 10 ns access time memory would be more expensive.
438
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Therefore, most system designers use slower memory, which requires one
or two wait states. However, this is still much faster than main memory.
Combined with the on-chip cache, you can get better performance.
Word Line
Bit Line
Fig. 7-13(a). Conventional DRAM Cell, with one MOT and one capacitor (1T-1C)
439
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
440
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
441
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
RAS CAS
RAS
CAS
Fig. 7-14. General block diagram of a DRAM chip and its timing diagram.
1. The control signals are all initially inactive (high), a memory cycle is
started with the row address applied to the address inputs and a falling
edge of RAS. This latches the row address and "opens" the row,
transferring data in the row to the buffer. The row address can then be
removed from the address inputs since it is latched on-chip.
442
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
2. With RAS still active, the column address is applied to the address
pins and CAS is made active as well. This selects the desired bit or
bits in the row, which subsequently appear at the data output(s). By
additionally activating WE the data applied to the data inputs can be
written into the selected location in the buffer.
4. Deactivating RAS causes the data in the buffer to be written back into
the memory array.
Figure 7-15(a) depicts the pin-out diagram of the 41256 DRAM chip.
Also, table 7-2 indicates the pin assignment of this chip. Note that the
chip organized as 256k x 1bit (Din/Dout). Figure 7-15(b) depicts how the
8088 microprocessor can be interfaced to eight 4164 (64k x 1bit) DRAM
chips, via the 74LS245 bi-directional buffers. In this figure, the address
pins are multiplexed by the 74LS158 (as required by the DRAM).
Multiplexing the address pins saves pins on the DRAM chip, but usually
requires additional logic in the system to properly generate the address
and control signals, not to mention further logic for refresh. Therefore,
DRAM chips are usually preferred when pin count is small. The
additional cost for the control logic is outweighed by the lower price.
A8 1 16 GND
Din 2 15 CAS
WR 3 14 Dout
RAS 4 41256 13 A6
A0 5 12 A3
A2 6 DRAM 11 A4
A1 7 10 A5
VCC 8 9 A7
443
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Pin Description
A0-A8 Address lines
Din Input data (1 bit)
Dout Output data (1 bit)
WR Write enable
RAS Row Address strobe
CAS Column Address strobe
VCC 5V Power supply
Fig. 7-15(b). Interfacing 8088 with a bank of eight 4164 DRAM chips (each chip is
64k x1 bit), via 74LS158 multiplexers and 74LS245 bi-directional data buffer.
444
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
BHE
447
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
448
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-17(a). SIMM with 30-pins memory module and its pin-out diagram
449
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
450
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Today, the DDR4-SDRAM standard aims for clock speeds between 2133
and 4266 MHz, with DRAM voltages of 1.1V~1.2V. Figure 7-17(e)
shows some photographs of the above mentioned memory modules. The
standardization authority JEDEC has set standards for speeds of DDR
SDRAM, divided into two parts: The first specification is for memory
chips and the second is for memory modules. Table 7-5 depicts these
specifications.
Table 7-5. Specifications for the early SDRAM modules (DDR and DDR2)
Note that PC100 is the SDRAM standard that meets the Intel PC100
specification. Intel created this specification to enable RAM
manufacturers to make chips that work with Intel's i440BX chipset. This
chipset was designed to operate at clock frequency of 100 MHz, on a 64-bit bus.
As faster chipsets appeared, new standards, like PC2-6400 appeared.
451
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
452
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-18(b). Interfacing a 1MB DRAM in 2 banks, via the 8205 DRAM controller.
453
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
454
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
455
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Such errors frequently happen during data transmission or data store into
memory. The reliability of a memory system can be improved by
employing error detection and correction codes (EDCC). This may be
achieved by various techniques such as:
The IBM PC original specifications obligated that all RAM of the main
memory should have a parity bit, to check for errors. This means that an
additional parity bit should be added to every 8-bit of main memory.
Unfortunately, while parity allows for the detection of single bit errors, it
does not provide a means of determining which bit is in error to correct it.
456
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-21. Logic symbol of the 74ABT853 data transceiver with parity generator.
457
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-22 Logic diagram of the 74ABT853 data transceiver with parity generator.
458
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
There are several types of serial EEPROMs, but most of them fall into
either a 2-wire or 3-wire interface category. The 2-wire interface, called
I2C or Inter-Integrated Circuit, uses only two wires, regardless of how
many chips are attached.
459
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
For microcontroller systems, these little chips offer a nifty way to store a
small amount of data, using only a few of the port pins, and without
raising the system cost. They are usually specified to retain the data for
10 years and to endure about 100,000 write operations before failure.
460
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
461
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
The first hard disk drive (HDD) was introduced in 1957 as a component
of IBM's RAMAC 350. It required 50 x24" disks to store data and cost
about $35,000. In 1973, IBM introduced the IBM 3340 hard disk unit,
known as the Winchester2. The recording head, of this drive rides on a
thin air gap 0.0005 mm thick, over the rotating hard disks. The descriptor
"hard" is used because the inner disks that hold data in a hard drive are
made of a rigid aluminum alloy. These thin disks (called platters) are
coated with a much improved magnetic material and last much longer
than plastic floppy diskette. The longer life of a hard drive is also a
function of the disk drive read/write head. In fact, the heads do not
contact the storage media in a hard disk drive, whereas in a floppy drive,
the read/write head does contact the media, causing wear. For years, hard
disk drives were confined to mainframe and minicomputers. With the
introduction of the IBM PC, in 1982, hard disk drives also became a
standard component of most personal computers.
2
IBM's development code name.
462
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Sector
Tracks
In 1997, IBM announced the highest capacity desktop PC hard disk drive
with a breakthrough technology called Giant Magneto-resistive (GMR)
heads. The first HDD with GMR heads was used in the IBM Deskstar, a
16.8 GB drive. Figure 7-17 depicts the hard disk drive and the disk
organization. The hard disk drive is composed of several disks and
several read/write heads. The heads are arranged as a movable comb.
Each surface of a disk is divided into concentric tracks and each track is
subdivided into sectors. Each sector can hold 512 bytes of data or more.
The tracks of similar diameter of all surfaces of the hard disk assembly
are also called a logical cylinder.
The data on the disk can be addressed by the surface number, the track
number as well as the sector number. The electronic circuits of the HDD
can identify the first sector, of a given track using an outer timing track
(in hard disks) or an index hole (in floppy diskettes). The hard disk
capacity can be calculated as follows. Assuming a HD with N reordering
surfaces, T tracks per surface, S sectors per track, and each sector holds a
463
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
The access time of a HD is the time delay between receiving the data
address and the beginning of data transfer. In moving head HDD's, the
access time is the sum of the track seek time and the rotational delay (or
latency) time. The track seek time is dependent on the relative distance
between the head initial track and requested track positions, and its
average value is about 10ms. The latency time is the time taken for the
head to be positioned on a requested address, after it has been positioned
on the requested track. The average latency time of a HDD is estimated as
the time of a half revolution, and its average value is about 5 ms.
The so-called disk controller is an interface circuit that controls the disk
speed, the head motion as well as data encoding and interfacing services.
The ST-506, the oldest disk controller, was capable of transmitting data
at a maximum speed of 1MB/s. The Integrated Drive Electronics (IDE)
systems incorporate the disk drive with its controller interface that can be
attached to the computer motherboard through a simple cable. The access
time of an IDE drive is in the order of 10 ms and its speed is 10 MB/s.
Some enhanced version of the IDE (EIDE) can transfer data at 33 MB/s.
The data transfer rate of the so-called SCSI (Small Computer Storage
Interface) controllers can attain 50 MB/s or even higher speeds. The
HDD is usually connected to the PC motherboard via parallel Advanced
Technology Attachment attachment (PATA) or serial ATA (SATA)
cables. More details about PATA and SATA can be found in Chapter 9.
Fig. 7-31. Schematic diagram of a compact disk (CD) and its optical head assembly.
465
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
466
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
The flash devices are configured as NAND flash or NOR flash. NOR and
NAND flash get their names from the structure of the interconnections
between memory cells, as shown in figure 7-34.
467
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
468
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Fig. 7-35. Flash memory cards of different sizes and form factors.
469
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
Nowadays, most new PCs have built-in slots for a variety of memory
cards; Memory Stick, CompactFlash, SD, etc. Some digital gadgets
support more than one memory card to ensure compatibility. Fig. 7-35(b)
shows the Fujitsu 1MB Flash memory MBM29LV800. This flash
memory is organized as 1M bytes of 8-bits or 512K words of 16 bits. The
Fujitsu Flash memory MBM29LV800 features a single 3V power supply
operation for both read and write functions. These devices can electrically
erase the entire chip or all bits within a sector simultaneously via Fowler-
Nordhiem tunneling. A sector is typically erased and verified in 1 sec (if
already preprogrammed). The bytes/words are programmed one
byte/word at a time using the EPROM programming mechanism of hot
electron injection. Figure 7-36 depicts the architecture of the 2GB NAND
Flash memory HY27HU08AG. Also, the following table depict the pin-
out diagram and pin assignments of this flash memory. Recently, some
companies, like Samsung Electronics, succeeded to produce a NAND
Flash memory chips, with a 32GB capacity per chip. Such Flash memory
470
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
chips can be used in huge capacity memory modules, which are able to
store up to 64GB of data, or 40 movies.
471
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
3
For more details about USB, refer to chapter 9.
472
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
out 2GB, 4GB, 8GB and 16GB capacity flash memories. High speed has
become a standard for modern flash drives and capacities of up to 256 GB
have come on the market, as of 2010.
A group called the Open NAND Flash Interface Working Group (ONFI)
has developed a standardized low-level interface for NAND flash chips.
This allows interoperability between conforming NAND devices from
different vendors. The ONFI specification version 1.0 was released on
December 28, 2006. It specifies:
474
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
7-11. Summary
475
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
SRAM stands for static random access memory. PCs, routers and servers
have this SRAM into their hardware. SRAM does not need to be
periodically refreshed with power to retain its information, like dynamic
random access memory (DRAM), helping it to conserve power. SRAM is
usually used as system cache, inside your PU and your PC motherboard.
SRAM employs so many transistors, typically 4 to 6 transistors 4 for each
bit as shown in Fig. 7-6(c), giving it faster speed but less storage capacity.
SRAM does not need to be refreshed, which makes it faster than DRAM.
The typical access time of SRAM is 5-10 ns, in contrast to a typical
access time of 60 ns for DRAM. Figure 7-7 depicts the block diagram of
an SRAM chip, like the 4008.
4
Some sort of SRAM's called FeRAM's (ferroelectric RAM) are utilizing only one transistor per bit.
476
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
DRAM stands for dynamic random access memory and unlike SRAM,
the chip needs to be periodically recharged with power to keep the
information on it from fading. DRAM has higher power consumption and
capabilities than SRAM. A new version of DRAM called single data rate
synchronous DRAM, SDR SDRAM led to faster computing and higher
memory capacities. The following figure depicts the DRAM cell:
The next table summarizes the RAM technologies and their applications
RAM Technology Application Access Speed Ports Characteristics
Static RAM level-1 and level-2 Fast One More expensive than DRAM
(SRAM) cache memory
Burst SRAM Level-2 cache Fast One SRAM in burst mode
(BSRAM) memory
DRAM Main memory Slow One A generic term for any kind of
Low-cost video dynamic (refreshed) RAM
FPM (Fast Page Main memory Slow One Prior to EDO DRAM, the most
Mode) DRAM Low-cost video common type of DRAM
EDO (Extended Main memory 5-20% faster than One Uses overlapping reads (one can
Data Out) DRAM Low-cost FPM DRAM begin while another is finishing)
BEDO (Burst EDO) Main memory and Faster than EDO One Not widely used, not supported
DRAM low-cost video DRAM by processor chipset makers
EDRAM (Enhanced Level-2 cache 15 ns SRAM One Contains a 256-byte SRAM
DRAM) memory 35 ns DRAM inside a larger DRAM
Nonvolatile RAM Preset phone Fast One Battery-powered RAM
(NVRAM) numbers
Synchronous DRAM Main memory See forms of One Generic term for DRAMs with a
(SDRAM) SDRAM synchronous interface
JEDEC SDRAM Main memory Fast One Dual-bank architecture
Most common form of SDRAM
PC100 SDRAM Main memory Intended to run at One An Intel specification designed
100 MHz to work with their i440BX
Double Data Rate Main memory Up to 200 MHz One Activates output on both up and
(DDR) SDRAM down of clock edge, double data
477
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
478
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
7-12. Problems
7-1) Discuss the different methods, which may be used for memory
addressing. Show how to decode an address for a ROM system using
simple NAND gates, 74LS138/74LS139 decoders. Show how to use a
PAL (programmable array logic) for address decoding of memory systems
7-2) Show how to use PAL16L8 for address decoding of sixteen 27512
EPROM memory devices (64k x 8 bits) interfaced to a Pentium
microprocessor at locations FFF80000H-FFFFFFFFH. Write down the
PAL program, to be used for the PAL16L8.
7-3) Draw a schematic diagram showing how the memory is organized in
the IBM PC, which is equipped with 8088 processor in maximum mode
7-4) Draw a schematic diagram showing how to interface 256kB RAM to
8088 microprocessor (8-bit data)
7-5) Draw a schematic diagram showing how to interface 256kB RAM to
8086 microprocessor (16-bit data)
7-6) Draw a schematic diagram showing how to implement a 32-bit
memory interface for 80386/80486 processors, using four 8-bit DRAM
banks (each up to 1GB).
Hint: each memory bank is connected to only 8-bit lines. For instance,
the first memory bank is connected to D0-D7 and bank 2 is connected to
D8-D15 and so on. The Address lines A0 and A1 are used for Bank
selection, while the other 32-bit address lines A2-A31 are used for
addressing memory locations inside each memory bank (up to 1GB).
7-7) Draw a schematic diagram showing how to implement a 64-bit
memory interface for Pentium processors (64-bit external data bus), using
four 8-bit DRAM banks (each up to 1GB).
7-8) Consider the moving area disk – storage device that has the
following parameters. Estimate the disk capacity, the average latency of
the disk and calculate the data transfer rate of the whole drive.
- Number of recording surface: 8
- Number of tracks/recording surface: 200
- Track storage capacity: 64 k bit/track
- Disk rotation speed (RPM): 2400 revolution per minute
Hint: The data transfer rate may be calculated as (track storage capacity)
/ (time taken to read/write a track). The average time taken to read/write a
track may be approximated as 60/RPM.
479
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
480
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
7-13. Bibliography
[7] http://www.intel.com
481
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7
482
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 8
Input/Output Interface
Circuits for Microprocessors
Contents
8-1. Introduction (I/O Transfer Modes)
8-2. Methods of Addressing I/O Ports
8-2.1. I/O address Space
8-2.2. Memory-mapped I/O
8-3. I/O Instructions
8-3.1. Register I/O Instructions
8-3.2. Block I/O Instructions
8-4. Protected I/O
8-5. Designing I/O interfaces in 80x86 systems
8-5.1. Implementing Simple Input Ports Using 74LS244 Buffers
8-5.2. Implementing Simple Output Ports Using 74LS373 Latch
8-6. The 8255 Programmable Peripheral interface (PPI) Chip.
Example 8-1. Basic I/O Mode
Example 8-2. Basic I/O Mode
Example 8-3. Keyboard Scanner & 7-Segment Display
Example 8-4. Square Wave generator (BSR Mode).
Example 8-5. Input from ADC
Example 8-6. Stepper Motor Control
8-7. I/O with Handshaking Capabilities
8-7.1. I/O with Handshaking Capabilities
Example 8-7. I/O with handshaking (Mode 1)
8-7.2. Bidirectional I/O with Handshaking Capabilities
Example 8-8. Bidirectional I/O with handshaking (Mode 2)
8-7.3. CPU services for I/O Control
483
484
Input/Output Interface
Circuits for
Microprocessors
8-1. Introduction (I/O Transfer Modes)
Interfacing is the process of connecting a microprocessor to the rest of
external devices. We have seen so far that microprocessors can access data
from I/O ports as well as memory. In this chapter we present the main
principles of I/O interfacing in any microprocessor-based systems, with
emphasis on x86 processors. There exist several modes of data
input/output in microprocessor-based and computer systems:
Data
Address
Control
CPU
Interface Interface Interface
The first two I/O modes are directly serviced by microprocessor, whereas
the other two modes are serviced by specialized chips. In this chapter, we
discuss the I/O operations of the x86 microprocessors, from the following
perspectives:
In addition, we’ll discuss the microprocessor services for smart I/O data
transfer, like interrupts and I/O with handshaking. We’ll also discuss how
the 80x86 can be interfaced to the 8237 DMA controller or an I/O
processor.
The program can specify the address of the port in two ways:
The instructions IN and OUT move data between a register and a port in
the I/O address space. The instructions INS and OUTS move strings of
data between the memory address space and I/O ports. Like words in
memory, 16-bit ports should be aligned at even-numbered addresses so that
the 16 bits can be transferred in a single bus access. An 8-bit port may be
located at any memory location, so that either an even or odd addresses are
possible.
487
IN (Input from port) transfers a byte, word, or dword from an input port to
AL, AX or EAX. If a program specifies AL with the IN instruction, the
processor transfers 8 bits from the selected port to AL. If a program
specifies AX with the IN instruction, the processor transfers 16 bits from
the port to AX. If a program specifies EAX with the IN instruction, the
processor transfers 32 bits from the port to EAX.
IN eAX,port# Or IN eAX,DX
where port# is the immediate value of input port address and eAX indicates
the accumulator name (AL or AX or EAX, according to the port size). As
we mentioned above, the input port address may be pointed at by a value
inside the DX register.
Again, port# is the immediate value of the output port address and eAX
indicates the accumulator name (AL or AX or EAX, according to the port
size). Also, the output port address may be pointed at by the DX register.
488
Block I/O instructions use either ESI or EDI to designate the source (for
OUTS) or destination memory address (for INS). For each transfer, SI or
DI are automatically either incremented or decremented as specified by the
direction flag bit (DF) in the flag register.
The string I/O primitives can operate on byte strings, word strings, or
doubleword strings. After each transfer, the memory address in ESI or EDI
is updated by 1 for byte operands, by 2 for word operands, or by 4 for
doubleword operands. The value in the direction flag (DF) determines
whether the processor automatically increments ESI or EDI (DF=0) or
whether it automatically decrements these registers (DF=1).
INS (Input String from Port) transfers a byte or a word string element from
an input port to memory.
INS dest,port transfers a byte, word or doubleword from the hardware port
specified by port or specified in DX to ES:EDI even if a memory
destination operand “dest” is supplied. The mnemonics INSB, INSW, and
INSD are variants that explicitly specify the size of the operand. For INSB,
INSW, INSD no operands are allowed and the size is determined by the
mnemonic.
489
Combined with the REP prefix, the OUTS instruction can move a block of
information from a series of consecutive memory locations indicated by
DS:ESI to an output port.
490
Instructions that deal with I/O need to be restricted but also need to be
executed by procedures executing at privilege levels other than zero. The
IOPL defines the privilege level needed to execute I/O-related instructions
(IOPL=0 means highest priority and IOPL=3 means lowest priority). The
IN, INS, OUT, OUTS, STI and CLI instructions can be executed in
protected mode only if current privilege level CPL = IOPL.
The 80386 and later processors have the ability to selectively trap
references to specific I/O addresses. The structure that enables selective
trapping is called the I/O Permission Bit Map
491
Fig. 8-2(a). Implementation of an input port using the 74LS244 octal buffer. Note that
G1 and G2 are active low and each one controls only 4 data bits of 74LS244 .
492
VCC
22k resistors
8 dip switches
D0 2 18 D0 D
A
D1 4 16 D1 T
D2 6 14 D2 A
D3 8 74LS244 12 D3
D4 11 9 D4
D5 13 7 D5 B
D6 15 5 D6 U
D7 17 3 D7 S
1 19
1G 2G
SEL
Fig. 8-2(b). Implementation of a simple input port using 74LS244 octal buffer and 8 dip
switches
493
Fig. 8-3(a). The implementation of an output port using the 74LS373 octal latch.
VCC
680
resistors
8 LEDs
D D0 3 2 O0
A D1 4 5 O1
T D2
7 6 O2
A
D3 8 74LS374 9 O3
D4 13 12 O4
B D5 14 15 O5
U D6 17 16 O6
S D7 18 19 O7
1
OC CLK 7
11
Fig. 8-3(b). Implementation of a simple output port using an octal latch and 8 LEDs
494
VCC
.gfedcb a
74LS374 270
resistors
D D0 3 2 O0
A D1 4 5 O1
T D2
7 6 O2
A
D3 8 9 O3
D4 13 12 O4
B D5 14 15 O5
U D6 17 16 O6
S D7 18 19 O7
OC CLK 7
A0
A1
.
.
A5
A6
A7
IOW
Fig. 8-3(c). Implementation of a output port by octal latch and 7-segment display
Note that the 8 flip flops of the 74LS373 or 74LS374 are working as edge-
triggered D-type flip flops (latches). On the positive edge transition of the
clock (CLK) in 74374 or the control signal (C) in 74373 the outputs are set
equal to the inputs of the 8 latches. Note also that if you make use of a
common anode (CA) 7-segment display, then you should invert all data
inputs. This may be done either by hardware inverters or by software.
Instead of using 8 date lines (O0-O7) to drive the 7degment one can use
only 4 lines and a BCD-to-7segment decoder, as shown in the next figure
495
Off course, additional driving circuits may be need to deliver more current
to the output devices as shown in the following figure.
,
496
PA3 1 40 PA4
PA2 2 39 PA5
PA1 3 38 PA6
PA0 4 37 PA7
RD 5 36 WR
CS 6 35 RESET
GND 7 8255 34 D0
A1 8 33 D1
A0 9 32 D2
PC7 10 31 D3
PC6 11 30 D4
PC5 12 29 D5
PC4 13 28 D6
PC0 14 27 D7
PC1 15 26 VCC
PC2 16 25 PB7
PC3 17 24 PB6
PB0 18 23 PB5
PB1 19 22 PB4
PB2 20 21 PB3
Fig. 8-4. Pin-out diagram of the 8255 PPI (or PIO) chip.
497
As shown in figure, the three ports are all 8-bits (PA0-PA7, PB0-PB7, and
PC0-PC7). One can select a certain port at a given time; by the address
lines (A0 and A1) as well as the chip select (CS), as indicated in table 8-1.
The read (RD) and write (WR) control signals are active low and can be
connected with IOR and IOW of the processor system bus.
Table 8-1. Port selection map of the 8255 PIO chip
Selected Port CS A1 A0
PA 0 0 0
PB 0 0 1
PC 0 1 0
Control Register 0 1 1
Chip not selected 1 x x
The 8255 chip has internal control register, which can be selected to write
in or read from via address lines (A0, A1) as indicated in table 8-1. Figure
8-5 indicates control register word mapping. As shown in figure, the
control register word is used to program the input /output ports according
to the following modes:
D7 D6 D5 D4 D3 D2 D1 D0
Chip PA PA PCU PB Mode PB PCL
Mode Mode Select Direction Select Direction
1 = I/O 00 = Mode 0 1=I 1=I 0= Mode 0 1=I 1=I
0 = BSR 01 = Mode 1 0=O 0=O 1= Mode 1 0=O 0 =O
1x = Mode 2
Fig. 8-5. Control register word of the 8255 PIO chip. PCL means the lower 4-bits of
Port C, while PCU means the upper 4-bits of Port C.
In addition, the 8255 can be operated in BSR mode (bit set/reset mode). In
this mode, only individual bits of port PC can be programmed. When port
PC is used as status/control for PA or PB, the bits of PC can be set or reset
using the BSR mode. When the RESET pin is activated high, it clears the
control register and the default mode is selected (all ports are set as input
ports).
Example 8-1 (Basic I/O Mode).
Show how to configure the 8255 ports as follows:
port PA as input, port PB as output and port PC (both PCL and PCU) as
output. Proceed as follows:
i) Draw the circuit diagram, which contains the 8255 and the micro-
processor address, data, and control read/write lines.
ii) Determine the port addresses which will be assigned to PA, PB, PC
as well as the control register of the 8255
iii) Determine the control word (byte) which you’ll use
Write an assembly program that inputs data from port A and then sends
this data to both ports B and C (the output ports)
Solution:
Assume we’ll use the microprocessor 1st 8 address lines (A0 through A7)
to generate addresses for the 8255 ports, as follows:
i) The first 2 address lines (A0, A1) are connected to (A0, A1) of the 8255,
and the Chip select (CS) is gated from (A2 through A7) as shown in figure
(8-6) so that CS = 110110
ii) When the address lines A0 through A7 are gated as shown in figure, the
port addresses are as follows:
PORT CS Address
A7 A6 A5 A4 A3 A2 A1 A0
A 1101 10 0 0 D8H
B 1101 10 0 1 D9H
C 1101 10 1 0 DAH
Control register 1101 10 1 1 DBH
499
D0 D0 PA
D1 D1 Input Port
D3 D3
D4 D4
D5 D5
D6
PB Output Port
D6
D7 D7
8255
P A0
D7 A0
A1 D7 PC Output
A2 A1
A3 Port
.A4
A5 CS
A6 RD WR
A7
IOR
IO
W
Fig. 8-6. Connecting the 8255 PIO chip with a microprocessor address, data, and
control lines
In IBM PC, the 8255 chip is used in I/O mode 0. So, PA address =60H, PB
address =61H, PC address =62H, control register address =63H. Also, the
default control word is 99H.
500
D0 D0 PA
D1 D1 Output Port
D3 D3
D4 D4
D5 D5
D6
PB Input Port
D6
D7 D7
8255
P A0
D7 A0
A1 D7 PCU
A1 Output
A2
A3
.A4 PCL Input
A5 C
A6 S RD WR
A7
IOR
IO
W
Fig. 8-7. Connecting the 8255 with the microprocessor address, data, and read/write
control lines
Solution:
i) The first 2 address lines (A0, A1) are connected to (A0, A1) of the 8255
chip, and the Chip select (CS) is gated from (A2 through A7) as shown in
figure (8-7) so that CS = 011111
ii) When the address lines A0 through A7 are gated as shown in figure, the
port addresses are as follows:
501
PORT CS Address
A7 A6 A5 A4 A3 A2 A1 A0
A 011111 0 0 7CH
B 011111 0 1 7DH
C 011111 1 0 7EH
Control register 011111 1 1 7FH
MOV AL,83H
OUT 7F,AL ; Fill control register with control word
IN AL,7DH ; Input data from port PB
OUT 7CH,AL ; Output data to port PA
IN AL,7EH ; Get the 4-bit from PCL
AND AL,0FH ; Mask upper bits of AL (make them 0)
MOV CL,4 ; Load counter with 4
ROL AL,CL ; Rotate left 4 times
OUT 7EH,AL ; Send the 4-bits to PCU
END
Solution
i) The first 2 address lines (A0, A1) are connected to (A0, A1) of the 8255,
and the Chip select (CS) is gated from (A2 through A7) as shown in figure
8-8, so that CS = 011011
ii) When the address lines A0 through A7 are gated as shown in figure, the
port addresses are as follows:
502
VCC
8086 8255
D0 PA0 Row0
D1 PA1 Row1
D2 PA2 Row2
D3 PA3 Row3
D4 PB0 Col0 10k
D5 PB1 Col1
D6 PB2 Col2
D7 PB3 Col3
A0 a
A1 PC0 b
: 7 :
PC6 g
CS RD WR 270
A2
A3
A4
A5
A6
A7 IO IOW
R
Fig. 8-8(a). Implementation of a 16-key keypad interface, using the 8255 PIO chip.
PORT CS Address
A7 A6 A5 A4 A3 A2 A1 A0
A 011011 0 0 6CH
B 011011 0 1 6DH
C 011011 1 0 6EH
Control register 011011 1 1 6FH
Start
Scan keys
Scan keys
Delay, to debounce
Delay, to debounce
Scan keys
Scan keys
Y
Closed Key? Y
Open Key?
N N
Determine Key Code
RET
Note that the delay is obtained by looping N (5000) times. Each loop
instruction takes about 17 clocks in 8086/8088, where the clock duration T
= 1/8MHz. So, we’ve: x 17. N.T = 10 ms or N =5,000
505
D7 D6 D5 D4 D3 D2 D1 D0
BSR x x x Bit select Bit select Bit select S/R
0 0 0 0 = PC0 1 = Set
0 0 1 = PC1 0 = Reset
0 1 0 = PC2
: : :
1 1 1 = PC7
In the first half cycle (Ts1) bit PC1 is set (high). So the control word in the
first half cycle is 00000011 or 03H. In the second half cycle (Ts2) bit PC1
is reset (low). So the control word in the second half cycle is 00000010 or
02H.
iii) The assembly program:
TITLE THE 8255 PPI as a Square Wave Generator (BSR Mode)
MOV AL,03H
OUT 7F,AL ; Fill control register with control word
CALL DELAY ; Set bit PC1 for a delay time = Ts/2
MOV AL,03H
OUT 7F,AL ; Fill control register with control word
CALL DELAY ; Reset bit PC1 for a delay time = Ts/2
END
506
PC1
:
PC7
t
0 Ts
Fig. 8-9. Schematic of the square wave, which is generated by the 8255 in BSR Mode
D7 D6 D5 D4 D3 D2 D1 D0
BSR x x x Bit select Bit select Bit select S/R
0 0 0 0 0 0 0 = PC0 1 = Set
0 = Reset
Therefore, the first control word CW1 = 01H to set PC0 or 00H to reset
PC0. The second control word (in the I/O mode) is as follows:
D7 D6 D5 D4 D3 D2 D1 D0
1 00 1 1 0 0 0
I/O PA Port A PCU PB Port B PCL
Mode 0 Input Input Mode 0 Output Output
507
The assembly program "ADC" that sends a start of conversion pulse and
reads digital data from the ADC at the end of conversion is as follows:
TITLE The 8255 PPI as ADC-Microprocessor Interface (BSR & I/O Modes)
ADC PROC NEAR
MOV AL,01H ; Issue BSR 1st control word
OUT 7F,AL ; Fill control register, Now PC0 =1
CALL DELAY
MOV AL,00H ; Issue BSR control word to reset PC0
OUT 7F,AL ; Fill control register, Now PC0 = 0. Start Conversion
READ:
MOV AL,98 ; Issue I/O Mode (Mode 0)
OUT 7F,AL ; Now PA is input and PCU is input
IN AL,7E ; Read PCU
RLC ; Place PC7 in Carry Flag bit (CF=PC7)
JC READ ; If PC7 = 1 then Rewind (wait) until end of conversion
IN AL,7C ; If PC7 = 0 then Read ADC digital output into AL
RET
ADC ENDP
DELAY PROC NEAR
MOV CX,12
G7: LOOP G7 ;
DELAY ENDP
END
508
Note that, since the analog voltage needs to be constant during A/D
conversion, you need a sample & hold circuit before the ADC. The
DELAY routine is based on 17 T. N = 20 s for a 10 MHz processor.
D0 D0
D1 D1 D
D3 D3 R
D4 D4 PA I
D5 D5 V
D6 D6 E
D7 D7 R
8255
P A0
D7 A0
A1 D7
A2 A1
A3 Stepper
.A4 Motor
A5 CS
A6 RD WR
A7
IOR
IO
W
Fig. 8-11(a) Schematic of the stepper motor driver circuit
Solution:
The four leads of the stepper motor windings (A, B, A and B) can be
controlled by four bits of any port of the 8255. Consider PA0-PA3 as to
control the stepper motor, as shown in figure (8-11a). In order to rotate in
clockwise direction, we've to feed the motor 4 coils by the step positions
(33H,66H,0CCH,99H), from PA. This allows ROL and ROR instructions
to rotate the position bit pattern to the next step forward or reverse
position. So, if POS=33H, then the instruction ROL POS,1 will result in
POS=66H. Similarly, if POS=66H, then ROL POS,1 will result in
POS=99H, and so on.
509
We may also use a driver between the output bits (PA0-PA3) and the
stepper motor coils. The driver may be 4 inverter gates, with protection
diodes (to bypass the back EMF of motor coils). One can also use the
ULN2003 chip, which has 7 inverters with protection diodes, as shown in
figure (8-11b). The motor common wires should be connected to +VCC.
Fig. 8-11(b) Schematic of the stepper motor driver circuit and the layout of ULN2003.
510
The following table indicates the most common drive modes of a stepper
motor.
1-Wave drive (1 phase is ON at a time, ABAB)
2-Full-step drive (2 phases ON at a time, ABABABAB)
3-Half-step drive (1&2 phases on, ABBABA ABBABA)
511
1- Strobing
2- Handshaking
Data
SOURCE DESTINATION
Strobe
512
Fig. 8-12(c) depicts the handshaking signals between a PPI (source) and an
I/O device (destination), for a data output job.
Request
SOURCE DESTINATION
Reply
Data
Fig. 8-12(c). Data transfer from CPU (via PPI) to Output device with handshaking
1-The CPU executes OUT and sends data from AL to the PPI
2-The PPI reads data
3- The PPI sends data to I/O device
4- PPI sets OBF to TRUE to tell I/O device the data is available & valid
5- The I/O device reads data when OBF changes from FALSE to TRUE
6- The I/O device sets ACK to TRUE to tell the PPI, it received data
7- The PPI raises an interrupt by setting INTR to TRUE
Fig. 8-12(d). Data transfer from an Input device to CPU (via PPI) with handshaking.
D7 D6 D5 D4 D3 D2 D1 D0
1 01 1/0 1/0 1 1/0 x
I/O PA Mode 1 Port A PC4,5 / PC6,7 PB Mode 1 Port B
In/Out In/Out In/Out
When the control register is loaded with the control word, which indicates
that the 8225 is used in a handshaking mode (Mode 1), the port PC, will be
furnished with a status word, which can be used by the I/O devices and the
microprocessor for servicing the asynchronous data transfer process. Note
that the PC free bits are PC4, PC5 when PA is output or PC6, PC7 when
PA is input. These free bits can be used as I/O bits.
514
Here IBFA signal means Input Buffer Full (of PA) and occupies bit PC5.
Also, INTE.A occupies bit PC4 and the two bits PC6 and PC7 are free and
can be used for input/output.
8255 PA 8255 PA
8255 PB 8255 PB
Fig. 8-13. Handshaking signals of the 8255 PIO chip for data input/output, to/from
PA and PB. The 2 free bits are determined according to whether PA is input or output
Figure 8-14(b) depicts the timing diagram of the 8255 in mode 1, for
strobed output. The handshaking signals for port PA in either of the above
two cases (input with handshaking or output with handshaking) are
delivered by 3 bits of port PC. Actually, 3 other bits are used for providing
handshaking signals of port PB (PC0, PC1, PC2) and the rest two bits of
PC are free. So, in I/O with handshaking mode, one can use PA as input
and PB as output or vice versa or both as inputs or both as output.
515
The following example depicts how the 8255 chip can be used for issuing
data from port PA (output data) to a line printer, with handshaking. So, the
PC7 and PC6 will hold the strobing (OBFA) and acknowledge (ACKA)
signals, for port PA, respectively.
516
So, the first control word (CW1) for I/O mode is 101000000 (A0H). The
assembly program is as follows:
Data
PA D0-D7
8255
PC3 PC6 ACK LPT1
INTR.A ACKA
PC7 Strobe
OBFA
Fig. 8-15. Data output from the 8255 PPI (port PA) to a line Printer with
Handshaking signals.
8255 8255
PA PA
PB PB
INTRA INTRA
PC3 PC7 OBF PC3 PC7 OBF
PC6 A PC6 A
PC5 PC5
PC4 ACKA PC4 ACKA
IBFA IBFA
PC1 Free
STBA PC1 OBFB
STBA
PC2 I/O PC2
PC0 bits PC0 ACKB
INTR.B
CW D7 D6 D5 D4 D3 D2 D1 D0 CW D7 D6 D5 D4 D3 D2 D1 D0
1 1 x x x 0 1 1/0 1 1 x x x 1 0 x
Note that there exist two interrupt enable signals here, INTE.A1
(associated with OBFA) and INTE.A2 (associated with IBFA). These
interrupt enable signals are gated inside the 8255 with OBFA and IBFA to
generate the interrupt request signal INTR.A as follows:
518
Solution:
The following program depicts the bidirectional operation of PA. We make
use of the masking bytes MB5 and MB7 to check IBFA (PC5) and OBFA
(PC7) of the status word.
519
1- Polling Control
2- Interrupt Control
As we’ve seen in the previous example, the CPU was checking regularly
for the presence of the INTR.A signal (interrupt request for PA). This
process is called polling.
Polling service is simple, but overheads the CPU. Interrupt service is more
efficient than polling, but needs somewhat more complex software
handling and additional hardware. Handling interrupts and their service
routines was previously presented in chapter 1 and chapter 4.
520
521
The first hold a 16-bit value for the address in memory and the second hold
a 16-bit value for the numbers of bytes (8-bit channels) or words (16-bit
channels). The last two are used to monitor the DMA transfer. The 8237
has two electrical signals for each channel, named DRQ (DMA Request)
and DACK (DMA Acknowledge).
There are additional signals with the names HRQ (Hold Request), HLDA
(Hold Acknowledge), EOP (End of Process), and the bus control signals
MEMR (Memory Read), MEMW (Memory Write), IOR (I/O Read), and
IOW (I/O Write).
The 8237 DMA is known as a ``fly-by'' DMA controller. This means that
the data being moved from one location to another does not pass through
the DMA chip and is not stored in the DMA chip. Subsequently, the DMA
can only transfer data between an I/O port and a memory address, but not
between two I/O ports or two memory locations.
The I/O device signals the 8237 on the DRQ (DMA Request) line.
The 8237 then signals the processor that it wants to take control of the
bus by activating the HRQ (Hold Request) line to the processor.
522
The system waits for the main processor to finish whatever it is doing
and then disconnects it from the bus and activates the HOLDA (Hold
Acknowledge) line, which causes the processor to be locked out. (The
processor is not actually halted, but merely left idle.)
The 8237 takes control of the bus, signals the device that it is ready via
the DACK (DMA Acknowledge) line and transfers data to or from the
device.
When the transfer is completed the DMA is disconnected from the bus,
all the lines are reset and the processor is reconnected to the bus and
carries out any tasks are demanded of it.
The four 8-bit DMA channels of the 8237 can be used with a variety of
adaptors, like:
523
The rest three 16-bit channels can also be used with a variety of adaptors,
such as:
Fig. 8-18. Connection of the Intel 8237 DMA controller with the 8088 microprocessor,
in IBM PC
i-Single mode. A single byte (or word) is transferred. The DMA must
release and re-acquire the bus for each additional byte. This is commonly-
used by devices that cannot transfer the entire block of data immediately.
The peripheral will request the DMA each time it is ready for another
transfer. The standard PC-compatible floppy disk controller (NEC 765)
only has a one-byte buffer, so it uses this mode.
ii-Block/Demand mode. Once the DMA acquires the system bus, an entire
block of data is transferred, up to a maximum of 64kB. If the peripheral
needs additional time, it can assert the READY signal to suspend the
transfer briefly. READY should not be used excessively, and for slow
peripheral transfers, the Single Transfer Mode should be used instead. The
difference between Block and Demand is that once a Block transfer is
started, it runs until the transfer count reaches zero. DRQ only needs to be
asserted until -DACK is asserted. Demand Mode will transfer one more
bytes until DRQ is de-asserted, at which point the DMA suspends the
transfer and releases the bus back to the CPU. When DRQ is asserted later,
the transfer resumes where it was suspended.
Older hard disk controllers used Demand Mode until CPU speeds
increased to the point that it was more efficient to transfer the data using
the CPU, particularly if the memory locations were above the 16M mark.
iii-Cascade mode. This mechanism allows a DMA channel to request the
bus, but then the attached peripheral device is responsible for placing the
address information on the bus instead of the DMA. This is also used to
implement a technique known as ``Bus Mastering''. When a DMA
channel in Cascade Mode receives control of the bus, the DMA does not
place addresses and I/O control signals on the bus like the DMA normally
does when it is active. Instead, the DMA only asserts the -DACK signal for
the active DMA channel.
So, it is up to the peripheral connected to that DMA channel to provide
address and bus control signals. The peripheral has complete control over
the system bus, and can do reads/writes to any address below 16M. When
the peripheral finishes with the bus, it de-asserts the DRQ line, and the
DMA controller can return control to the CPU or to other DMA channel.
The slave DMA controller then transfers data for the DMA channel that
requested it (0, 1, 2 or 3), or the slave DMA may grant the bus to a
peripheral that wants to perform its own bus-mastering, such as a SCSI
controller.
Because of this wiring arrangement, only DMA channels 0, 1, 2, 3 (of
master DMA) and 4, 5, 6, 7 (of slave DMA), are usable with peripherals on
PC/AT systems.
Note that DMA channel 0 was reserved for refresh operations in early IBM
PC computers, but it is generally available for use by peripherals in
modern systems. When a peripheral is performing Bus Mastering, it is
important that the peripheral transmit data to or from memory constantly
while it holds the system bus. If the peripheral cannot do this, it must
release the bus frequently so that the system can perform refresh operations
on main memory.
As we mentioned so far in chapter 7, the DRAM used in PCs must be
accessed frequently to keep the charge of stored bits. Since memory read
and write cycles ``count'' as refresh cycles (a dynamic RAM refresh cycle
is actually an incomplete memory read cycle), as long as the peripheral
controller continues reading or writing data to sequential memory
locations, that action will refresh all of memory.
iv-Auto-initialize mode. This mode causes the DMA to perform Byte,
Block or Demand transfers, but when the DMA transfer counter reaches
zero, the counter and address are set back to where they were when the
DMA channel was originally programmed. This means that as long as the
peripheral requests transfers, they will be granted. It is up to the CPU to
move new data into the fixed buffer ahead of where the DMA is about to
transfer it when doing output operations, and read new data out of the
buffer behind where the DMA is writing when doing input operations.
The length that is loaded is one less than the amount you expect the DMA
to transfer. The LSB and MSB of the address and length are written to the
same 8-bit I/O port, so another port must be written to guarantee that the
DMA accepts the first byte as the LSB and the second byte as the MSB of
the length and address.
Then, one has to be sure to update the Page Register, which is external to
the DMA and is accessed through a different set of I/O ports. Once all the
settings are ready, the DMA channel can be un-masked. That DMA
channel is now considered to be ``armed'', and will respond when the DRQ
line for that channel is asserted. You can refer to the 8237 data sheet for
more precise programming details. You will also need to refer to the I/O
port map for the PC system, which describes where the DMA and Page
Register ports are located.
The following is a rude example of the communication protocol used when
DMA is being used to transfer data from an adapter to memory:
When the transfer begins the DMA channels registers are loaded with
the correct address base and counter value.
The adapter is told to begin the transfer.
When adapter has the first data ready it signals the DMA controlchip.
The DMA controlchip ask the CPU for ownership over the data bus.
When ownership is granted, the DMA chip signals back to the adapter
to start sending.
At the same time the DMA puts the base address on the address bus.
The adapter puts its data on the data bus. And the the RAM circuits
automaticly reads the data.
The DMA chip controls the transmission. And then sends a
Transmission Complete (TC) signal when the transmission is finished
(ie. when the counter value changes from 0000H to FFFFH).
527
When the adapter senses the TC signal it uses its IRQ, to inform the
code which requested the DMA transfer that it has finished.
The program can then check to see if the transmission has run
smoothly.
Figure 8-19 depicts the connection of IOP to the main processor, via a
local bus. The communication between the host processor and the IOP may
be summarized as follows. The host processor initiates an I/O operation by
writing a message in memory to describe the I/O function to be performed.
Then the IOP reads this message from memory and carries out the I/O
operation. When the IOP finishes, it notifies the host CPU when it is done.
528
Fig. 8-19. Connection of an I/O Processor (IOP) to a host CPU, via a local bus.
8089 Channel 1
Channel 2
DRQ1
Lock GA
Control logic
GB EXT1
RQ/G 19 0
GC SINTR1
T CCP
Status TP
SEL
PP
15 0
READY IX
BC DRQ2
ALU MC
RESET CC
EXT2
CLK Bus Control PSW SINTR2
Fig. 8-20. Functional block diagram of the Intel 8089 I/O Processor.
529
The 8086 prepare control blocks that describe the task to be performed,
and then dispatched the task to the IOP through a channel attention signal
(via the RQ/GT). Then the 8089 IOP reads the control block to locate a
program sequence called "channel program", which is written in 8089
instruction set.
The IOP performs the assigned task by executing this program. When the
IOP is done, it notifies the 8086 either through an interrupt request or by
updating a status location in memory.
Fig. 8-21. Functional block diagram of the Intel 80321 I/O Processor.
530
8-10. Summary
The following circuits depict the simplest form input and output ports in a
microprocessor system
531
8-11. Problems
8-2) Write a program for a 8088 microprocessor to output the word stored
in EX register to the output ports whose addresses are 8004H and 8005H.
8-3) Show how you can design 2 output ports using the 74LS373 chips and
how to connect them to an 80x86 microprocessor as a single 16-bit port.
8-4) Show how to connect two 8255 chips 80x86 microprocessor system to
obtain 3 programmable 16-bit I/O ports
8-5) The 8255 chip is connected to an 8086 microprocessor such that its
ports are assigned as follows: port A and port C as input, and port B as
output.
i) Draw the circuit diagram, which contains the 8255 and the
microprocessor address, data, and control read/write lines.
ii) Determine the port addresses which will be assigned to PA, PB,
PC as well as the control register of the 8255
iii) Determine the control word (byte) which you’ll use
iv) Write an assembly program that inputs data from port A and port
C and add them and then sends the result to ports B
Hints: As shown in fig.ure 8-9, the four leads of the stepper motor (WA,
WB, WC and WD) can be controlled by four bits of any port of the 8255.
Consider PA0-PA3 (of port PA) in this example as to control the stepper
motor.
8-11) In DMA transfers, the required signals and addresses are given by
a) Processor
b) Device drivers
c) DMA controllers
d) The program itself
533
8-12) After the complition of the DMA transfer the processor is notified by
a) Acknowledge signal
b) Interrupt signal
c) WMFC signal
d) None of the above
534
8-12. Bibliography
[2] http://www.hokeyball.com
[3] http://www.XBitlabs.com
[4] http://www.x86-guide.com
535
536
Interface Circuits
with
IBM PC & Compatibles
Contents
537
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
538
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Interface Circuits
with
IBM PC & Compatibles
9-1. Introduction (Overview of the IBM PC)
Before we jump onto the details of the IBM PC and how to interface them
with external circuits, it is wise to look at their internal components and
how they work. The IBM PC & compatible microcomputers are based on
the Intel series 80x86 of microprocessors.
The basic hardware configuration of a IBM PC‘s has not changed much
over the years. A typical PC Computer consists of the following items.
1. System Unit that contains:
o Mother Board that provides
CPU, RAM, BIOS ROM, Bus slots, Parallel and Serial I/O Ports
o Hard disk drives, Floppy disk drives and CD/DVD drives.
o Video Interface card
o Switch mode Power Supply
539
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
2. Keyboard &Mouse
3. Video Display Unit (VDU or Monitor)
4. Printer
Fig. 9-2. General block diagram of an IBM PC, with peripheral devices.
Fig. 9-3(a). Motherboard of the first IBM PC (1981) and itsschematic layout
541
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-3(c). Intel motherboards: DG965 for an Intel Core2 Duo microprocessor
542
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
543
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The BIOS (basic input/output system) ROM performs what is called the
Power-On Self-Test (POST), when it boots the PC. The POST is a built-
in diagnostic program that checks the PC hardware to ensure that
everything is present and functioning properly. Another function of the
BIOS is to set of information that is critical to the operation of your PC,
but is not stored on your hard disk at all. This is called the CMOS
Settings. These settings are very important because minor changes to
them can have a major impact on how your system functions.
Expansion slots has so many pins to power the expansion cards and for
connecting then with data, address and control bus. The expansion slots
are connected with the CPU via a group of signal lines (on the
motherboard) called the expansion bus. The expansion bus contains a
large number of input/output pins, for data and address as well as control
signals and it is usually operated at a frequency, lower than the
microprocessor clock. The efficiency of the expansion bus is expressed in
terms of its bandwidth. The expansion bus speed is calculated using the
following equation:
Bus speed = Bus width (in Bytes) x Bus Clock (in MHz)
545
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
546
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
In the original 6MHz IBM PC/AT, and the subsequent 8MHz version, the
bus ran at the same speed as the CPU. It was not surprising that as PC
clone vendors started looking for a marketing edge over IBM, they
simply kept the bus running at the CPU speed as they boosted speeds to
12 MHz, or even faster. This led to problems with users. Boards that ran
fine in 8 MHz PCs were not reliable at faster speed. The industry settled
on 8 MHz as a standard clock speed and on the name Industry Standard
Architecture. Figure 9-6 depicts the ISA bus and its pin assignments.
Fig. 9-7(a). Overview of the ISA interface bus and an ISA card.
As shown, the upper part of the ISA bus is identical to the old 8-bit PC
bus and is divided into 'A' and 'B' sides. The lower part is divided into 'C'
and 'D' sides, which provides additional pins for the 16-bit data, 24-bit
address bus as well as supplementary interrupt requests and DMA
channels. Now, let's start describing the operation of the ISA bus with a
simple read cycle from an Input/Output port. The first thing the
microprocessor does is to send out a high on the ALE signal, and then
sends out the A0-A19 lines. After that, the ALE signal goes low.
548
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
549
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
From now on the address of the target port to be read is latched. Then the
ISA bus takes the IOR signal to low level so that the addressed device is
put a data byte onto the D0-D7 data bus. The microprocessor will read
then the data bus and take the -IOR signal to a high again. A write cycle
to a port works this way: The microprocessor asserts the ALE high, and
then outputs the port address on A0-A19. Then the ALE goes low again.
The microprocessor sends out the data byte to be written to the data bus.
It then asserts the IOW signal. After the device have time to read the data
byte, the microprocessor raises the IOW signal high again. The only
difference between a memory read/write cycle and a port read/write cycle
is that the MEMR and MEMW signals will be replaced by IOR and IOW
signals. The 24-bit address lines of the ISA bus limit its capability to
handle I/O cards that use the first 16 MB of addressable memory space
for RAM or ROM which are built on the card. Some video cards look for
a memory aperture (also known as a linear frame buffer), a hole in the
system memory, where they can insert and address their own memory
(several Mega bytes). This memory aperture overcomes the problem of
page switching brought about by the assignment of only a 128 kB for the
Video RAM (VRAM) in system memory map. The VRAM of such VGA
cards can be accessed by switching parts of it in and out of the memory
range
Interfacing I/O devices to the IBM PC via ISA bus (or PC bus) needs,
at least, the connection with the following pins (among the first 62 pins):
1- A0-A9 for address decoding (you can assign up to 1k ports)
2- IOR and IOW (both active low)
3- AEN signal: AEN = 0 when the CPU is using the bus
Fig. 9-8. Overview of the MCA bus and the EISA bus.
551
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The PCI-bus uses three elegant techniques to resolve local bus problems.
The first, known as reflective wave signaling, reduces the amount of
electrical amplification required on the signal paths and thus reduces
noise and loading problems. The second is multiplexing. Multiplexing
allows two different signals to use the same electrical path, reducing the
number of pins required for peripheral chips. The third is a protocol
letting the PCI controller receives specific configuration information from
the PCI devices themselves. Intel did not define a standard adaptor
connector for the bus, leaving that job up to a PCI-bus special-interest
group (PCI-SIG) who settled on the white 112 pin connector. However,
the original PCI bus had 32 data lines and could operate 4 times faster
than the ISA bus (at 33MHz). The 64-bit data PCI bus, which operates at
66 MHz, is 4 times faster the original PCI bus. As shown in figure 9-5, a
PCI interface needs a minimum of 47 pins, if it operates as a target device
and 49-pins if it works as a master device.
553
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
554
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-10(b). I/O Pins and corresponding signals of the 32-bit PCI bus.
555
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The AGP also enables graphics cards to execute texture maps directly
from system memory instead of forcing it to pre-load the texture data to
the graphics cards local memory. Some authors consider the AGP as
some sort of advanced PCI. For a system with a front-side bus (FSB)
speed of 133 MHz, the AGP speed is equal to the memory bus speed.
Therefore, it can support a data transfer rate of 1066 MB/s. AGP attains
this high transfer rate because of its ability to transfer data on both the
rising and falling edges of the clock. In addition, AGP does not share
bandwidth with other devices, whereas the PCI bus shares bandwidth.
However, in recent PC‘s the AGP is replaced by the PCI Express bus.
556
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
PCI Bus
Fig. 9-12(b). AGP Architecture and its connection to the PCI bus.
557
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
558
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-13. Slots of the PCI Express bus (from top to bottom: x4, x16, x1 and x16),
compared to a traditional 32-bit PCI slot (bottom).
Fig. 9-14. GPIB bus and how it connects devices to a PC, via a GPIB cable.
560
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-16. Simplified model of the JTAG bus. The connector pins are:
TDI (Test Data In), TDO (Test Data Out), TCK (Test Clock), TMS (Test Mode
Select) and an optional TRST (Test Reset).
561
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
As shown in the figure above, LIN bus is a single-wire bus connected via
a termination resistor to the positive battery node Vbat. The bus is
terminated with a pull-up resistance of 1kΩ in the master node, and
typically 30kΩ in a slave node.LIN versus CAN.
Compared to CAN, LIN offers the advantage of lower cost per node
when the bandwidth and performance of CAN is not needed. LIN's lower
cost results from the use of single-wire communications, a lower
implementation cost versus CAN, and need for crystals in the slave
nodes. The tradeoff for LIN's lower cost is the more restrictive nature of
a single-master network and lower bandwidth, as indicated in the
following table.
563
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-20. PC expansion slots. The shown motherboard has 2 ISA and 3 PCI slots
564
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The Processor Bus: This is the highest-level bus that the chipset uses to
send information to and from the microprocessor.
The Cache Bus: Higher-level architectures, such as those used by the
Pentium processors, employ a dedicated bus for accessing the system
cache. This is sometimes called a backside bus. Conventional processors
using fifth-generation motherboards and chipsets have the cache
connected to the standard memory bus.
The Memory Bus: This is a second-level system bus that connects the
memory subsystem to the chipset and the processor. In some systems the
processor and memory buses are basically the same thing.
Fig. 9-21. Different busses and I/O devices in the IBM PC.
565
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The Local I/O Bus: This is a high-speed input/output bus used for
connecting performance-critical peripherals to the memory, chipset, and
processor. For example, video cards, disk storage devices, and high-speed
networks generally use such a bus. The two most common local I/O buses
are the VESA local bus and the PCI bus.
The Standard I/O Bus: Connecting to the above three buses is the
standard I/O ISA bus, used for slower peripherals (modems, sound cards,
low-speed network) and also for compatibility with older devices.
Nowadays, PCI Express has replaced AGP as the most common interface
for graphics cards. PCI Express is also used for gigabit Ethernet and Wi-
Fi. However, add-on cards are still generally PCI. Sound cards, modems
and other cards with low speed are still all PCI. For this reason most
motherboards still offer legacy PCI slots.
i- Multi Drop
In multi drop topology, the devices are connected in parallel on the bus.
The data transmitted by any device is sent to all other devices and it is up
to each device to accept or reject the data. If the data on the bus matches
the device criterion, the device may read the data. Otherwise the device
will stay inactive. A contention can happen if more than one device tries
to transmit data on the bus. In order to avoid this, multi drop buses have a
mechanism of collision detection and correction.
566
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-22. Bus topologies. Multi-drop, Daisy-chain and Switched hub topology.
567
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-23. Cardbus and Express cards, for laptop and notebook computers
568
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-24. Circuit diagram and layout of an ISA bus extension card
569
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
PC‘s are usually equipped with two serial ports. IBM originally called
these communications ports as COM ports (COM1 and COM2). The
serial ports are built using a universal asynchronous receiver/transmitter
(UART) chip to convert data from parallel to serial and from serial to
parallel transfer. Serial ports usually make use of the so-called RS232
interface standard (sometimes called EIA-232-D) that specifies the data
logic levels during data transfer. Serial ports are usually used for:
1. Physical layer- Hardware equipment of the bus and the layer that
passes the bit streams to and from the network
2. Data Link layer- Deals with the messages at the bit and byte level and
provides data transfer control between a node and the network,
3. Network layer- Sets up addresses and delivers message packets ,
4. . Transport layer- Controls the sequencing of message components,
5. Session layer- Manages the data coordination during communication,
6. Presentation layer- Performs data conversion & encryption,
7. Application layer- Provides the user interface to lower levels,
571
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
1 2 3 4 5 6 7 8
Space
Mark
1 0 1 0 1 1 1 0 1 0 0
Start -------------- Data bits ----------- Parity Stop bits
Fig. 9-24. Serial data frame. The picture shows how data leaves the serial port for
the period character "." (0101110) with even parity.
Indeed, the majority of serial ports in the past were using the RS232C
interface standard. The RS232 was first introduced in 1962 by the EIA,
and has remained widely used through the industry. In fact, the RS232 is
still supported in many microcontrollers and embedded computer
projects. The RS232 signals are represented by voltage levels with
respect to a system ground. As shown in figure 9-24, the "idle" state
(MARK) has a negative signal level, and the "active" state (SPACE) has
the positive signal level. There are 2 types of RS-232 devices. The first is
called a Data Terminal Equipment DTE device. A common example is a
computer. The other type is called a Data Communications Equipment
DCE device. A common example is a MODEM. The RS-232 interface
pre-supposes a common ground between the DTE and the DCE. This is a
reasonable assumption when a short cable connects the DTE to the DCE,
but with longer lines or connections with different grounds, this may not
be true . S232 data is bipolar. Thus, a +312V indicates an ON or 0-state
(SPACE), while a –3V-12V indicates an "OFF" 1-state (MARK).
Modern computer equipment ignores the negative level and accepts a
zero voltage level as the "OFF" state. In fact, the "ON" state may be
achieved with lesser positive potential. This means circuits powered by
5V are capable of driving RS232 circuits directly. However, the overall
range of the RS232 signal may be dramatically reduced when transmitted
or received.
572
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
In the various RS-232-like definitions this dead area may vary. RS232
has numerous handshaking lines. These lines specify the communication
protocol that controls data flow between the DTE (PC) and DCE
(external devices). Request to Send (RTS) is one of the hardware
handshaking signals. When the PC wants to send data to an external
device it sets this pin to 0. In other words, it sets the pin to 0 and says "I
want to send you data. Is it ok?" The external device (MODEM) says it is
OK to send data by setting its clear to send (CTS) pin to 0. The PC then
sends the data. Clear to Send (CTS) is the other half of hardware
handshaking. As noted above, an external device, like a MODEM, sets
this pin to 0 when it is ready to receive data from PC
Classical Drivers which require plus (+) and minus (-) voltage power
supplies as the 1488 series of IC's (most desktop PC's use this type).
Low power drivers which require one +5V power supply (as DS232).
This type of driver has an internal charge pump for voltage
conversion. Many industrial microprocessor controls use this type.
Low voltage (3.3V) drivers which meet the EIA-562 standard (for
laptops(.
In order to do so, the UART has parallel-to-serial shift registers and for
serial-to-parallel conversions. The UART chip provides also all the
handshaking required to control the flow of data to and from the
computer and other serial devices. There are many different types of
UART chips, but the PC COM port is based on chips compatible with the
National Semiconductor 8250. The 8250 chip was used in the serial ports
of PC/XT (based on 8088 microprocessor), and the 16450 in the serial
ports of PC/AT (80286) and then with 80386, and 80486 machines, until
early 1995. Over the years the maximum data rate provided by devices
connected to the serial ports has been steadily rising. Back in 1987, the
2.4kb/s telephone MODEM was considered fast. Until 2000, the cost
effective telephone MODEM's were transferring data at 56 kb/s.
The more recent ADSL MODEM's are so much faster (about 1 Mb/s or
higher). The serial ports must keep up with the MODEM and therefore
the UART must be faster than the MODEM. The UART chips usually
include four internal registers:
574
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-25. Connection of the 8250 UART with a MODEM, for serial data
communication between two computers via a telephone line.
575
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
576
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
There are two ways to address the serial port, by the 14H BIOS interrupt
and by the 21H DOS interrupt. The 14H BIOS interrupt uses four
functions to program the serial port. Each function is selected assigning a
value to the AH register of the CPU. We list these functions below:
Function 00H: Initializes serial port, sets speed, data stop and parity bits
Function 01H: Sends a character to the specified serial port.
Function 02H: Reads a character from the specified serial port.
Function 003: Returns the state of the specified serial port.
There are three functions in the 21H DOS interrupt related to the
operation of the serial port:
577
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
void init_port(void);
char state_port(void);
void send_byte(unsigned char);
unsigned char read_byte(void);
void keyb(void);
int tecla = 1;
//**************************************************
void init_port()
{
union REGS regs;
regs.h.ah = 0x00;
regs.x.dx = COM2;
regs.h.al = PARAM;
int86( 0x14, ®s, ®s);
}//*************************************************
578
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Male
Connectors
Female
Connectors
580
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
A single connector can be used for chaining many devices that have to be
interfaced via Serial, Parallel or Games Ports in the past. All kinds of
devices can be hooked to the PC through the same connector
simultaneously. These devices include:
USB requires less real estate (less space on the back plane) than existing
I/O ports and this is particularly important for laptop and hand-held PDA
systems. It reduces the number of BUS slots required on the system
board, allowing a footprint reduction for desktop systems.
Fig. 9-29. USB cables lines. Data is transferred on differential pair (D-, D+).
USB host, which is usually the computer (only one host is allowed)
USB devices (serial devices, like mice and kB, or hubs)
USB interconnects.
USB Specifications
The old USB 1.1 had ample bandwidth for digital gaming peripherals and
video applications, and provided cost effective connection for peripheral
devices. The main features of USB 1.1 are as follows:
USB 2.0, is 40X faster than USB 1.1. It provides additional bandwidth of
peripherals that may be attached to your computers. USB 2 moves data at
480 Mb/s, and backwards compatible with the old USB devices. USB 3.0
is the new standard for super speed Universal Serial Bus, USB 3.0 can
support speeds up to 5Gbps. USB3 is back compatible with USB2. Much
like its predecessors, USB 3.0 will offer a variety of plug and receptacle
types. Note that the "B" plug on USB3 is different from USB2.
Fig. 9-30. USB 3/0 male plugs. Note that the B plug is different from that of USB2
D2XX driver allows direct access to a USB device via a DLL interface.
A. ACCESS.bus
The concept of ACCESS.bus (or A.b), was originally developed by
Philips Semiconductors and Digital Equipment Corp. (DEC) in the early
1990‘s, and was taken over by the Independent ACCESS.bus Industry
Group (ABIG). As compared to I2C bus, ACCES.bus adds two wires to
provide power to the connected devices (+5 V and GND). Also, A.b
supports the 100 kbit/s and 10 kbit/s modes. Compared to USB, A.b has
several advantages. One is that any device on the bus can be a master or a
slave, and a protocol is defined for selecting which one a device under
any particular circumstance. This allows devices to be plugged together
with A.b without a host computer. For instance, a digital camera could be
plugged directly into a printer and become the master. Under USB the
computer is always the master and the devices are always slaves. At first
the ACCESS.bus showed the potential to become an industry standard. On
the downside, A.b is much slower than USB. For this reason, it did not
attract much support from PC hardware manufacturers.
Fig. 9-33. Connection of the PC with peripheral devices, via the IrDA port.
Support for IrDA must be provided at both the hardware and software
level. Many of the system boards have an IrDA interface and hardware
support provided by the BIOS. Many Laptop Computers have an IrDA
port built in and some printers have IrDA interfaces.
585
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
586
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
587
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Routers and bridges link two or more individual Local Area Networks
(LAN's) to create an extended-network LAN or Wide Area Network
(WAN). On the other hand, routers are physical devices that link
multiple wired or wireless networks. Home networks often use an
Internet Protocol (IP) wired or wireless router, where IP is the most
common OSI network layer protocol. An IP router such as a DSL
MODEM router can join the home LAN to the Internet WAN. Since
2000, 802.11b has become the standard wireless Ethernet networking
technology for wireless LAN's (WLAN's). As shown in the following
figure, an ADSL (asynchronous digital subscriber line) circuit connects
an ADSL modem on each end of a twisted-pair telephone line The
802.11b is a half duplex protocol – it can send or receive, but not both at
the same time. The 802.11b adapter cards come in two major forms,
namely, PC Cards for laptops and USB cards for desktops. In addition,
there are PCI adapters that let you plug a PC Card into a PCI Slot. An
802.11b wireless network adapter can operate in two modes, Ad-Hoc and
Infrastructure. In infrastructure mode, all your traffic passes through a
wireless ‗access point‘. In Ad-hoc mode your computer talks directly to
other computers and does not need any access point.
589
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-38(a). Wideband area network (WAN) and Internet connection to a local area
network (LAN) via wideband DSL modems.
590
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-38(b). Wideband area network (WAN) and Internet connection to a local area
network (LAN) via wideband modems and a hub or a switch.
Fig. 9-39. ATM data packets (cells)as compared to conventional RS 232 frames.
592
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
593
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The standard parallel cable has a DB25P (plug) on the computer end (a
socket is used on the computer) and a 36-pin Centronics plug on the
printer end. The cable should be shielded and should be no longer than
3m. When ASIC chips were first used to provide the parallel port, some
of these had trouble driving long cables (over 3 m) because they had
CMOS outputs rather than TTL outputs and they did not like high
capacitance loading.
Data (8 lines)
Control (4 lines)
Status (5 lines)
Both the Status and Control lines (9 lines) provide handshaking. All these
signals are connected to a 25-pin connector (DB25), as shown in figure 9-
31. All the bits have TTL logic levels. The signal lines are listed below:
Outputs:
STROBE (pin 1): Tells the printer when the 8 data bits are ready
to be read.
Turns to a low logic level when the data are ready.
D0-D7 (pin 2-9): Data bits.
AUTO FD (pin 14): Tells printer to print empty line followed by
carriage return
INIT (pin 16): Reset the printer.
SLCT IN (pin 17): Selects the printer when it turns to a low logic
level.
Inputs:
ACK (pin 10): Tells the CPU that the data has been correctly
received.
BUSY (pin 11): The printer sets this line when its buffer is full. The
computer will stop sending more data.
PE (pin 12): Paper End. The printer is out of paper.
SLCT (pin 13): Tells the computer that a printer is present.
ERROR (pin 15): An error occurred. The CPU stop sending data
The status lines (S3-S7) were used for Flow Control signals and as
Status Indicators for such things as paper empty, busy indication and
interface or peripheral errors. The data lines (D0-D7) were used to
provide data from the PC to the printer, in that direction only. Later
implementations of the parallel port allowed for data to be driven from
the peripheral to the PC.
PIN NUMBER
PIN NAME DESCRIPTION
DB25 36 pin Centronics
Strobe 1 1 1us pulse used to clock data into the printer
Data 0 2 2
Data 1 3 3
Data 2 4 4
Data 3 5 5
Data 4 6 6
Data 5 7 7
Data 6 8 8
Data 7 9 9
Acknowledge 10 10 acknowledge signal from printer to PC
Busy 11 11 used by the printer to stop the flow of data
Paper Empty 12 12 indicates the printer has run out of paper
595
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
At boot-up of the PC, the setup routines in the BIOS look for parallel
ports on the I/O bus, and assigns the following LPT addresses, for LPT1,
to LPT3 in this order:
03BC to 03BE
0378 to 037A
0278 to 027A
Officially LPT1 uses I/O address 0378 to 037A but when the BIOS setup
routine is looking for parallel Ports it assigns the first one it finds as
LPT1. The address 03BC to 03BE was first provided by a parallel port on
IBM Monochrome display adaptor but today it is quite common to find
this address available on Parallel Port hardware. The Parallel Ports are
assigned an IRQ line as follows.
Port IRQ
LPT 1 IRQ 7
LPT 2 IRQ 7 or IRQ 5
In old 8-bit PC computers (PC/XT) IRQ7 was assigned to both LPT1 and
LPT2 but in later generation hardware IRQ5 is assigned to LPT2. The
IRQ line is not usually used by software communicating with the LPT
ports and so IRQ7 and IRQ5 are available for other I/O functions. Sound
Cards use either IRQ5, 7 or 10 as the default IRQ.
596
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The Centronics protocol specifies that the DATA lines be stable from at
least 500 ns before to at least 500 ns after the STROBE pulse, and the
STROBE pulse be at least 500 ns long. These times may of course be
shortened for a specific printer, at the risk of loss of generality. Programs
using the polled mode should include a timeout counter to guard against a
permanent BUSY condition. BIOS calls and DOS functions use this
mode for printing.
597
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
int _inp(unsigned pid); // returns a byte read from the I/O port pid
int _inpw(unsigned pid); // returns a word read from the I/O port pid
unsigned _outp(unsigned pid, int value); // writes byte to I/O port pid
unsigned _outpw(unsigned pid, unsigned value); // writes word to I/O
The following program shows how to send and receive a byte to/from the
parallel port. The _outp(pid, value); function sends a byte to a specified
I/O port. The first function parameter (pid) is the address of the port to
write a byte. pid can be any unsigned integer in the range 0-65535. The
second parameter (value) is the value of the byte to send. Both parameters
can be defined as variables. The _inp(pid); function read a byte from the
specified I/O address (pid) of the computer.
#include <stdio.h>
#include <dos.h>
#include <conio.h>
#define DATA 0x378
#define STATUS 0x379
#define CONTROL 0x37A
int main (void)
{
clrscr();
int bits, dummy; // 0 <= bits <= 255
dummy = _outp(DATA, bits); // output data
Bits = _inp(STATUS); // input data
getch();
return 0;
}
The following example shows you how to read and write a byte from/to
the parallel port in VC++. Here are the steps to write the parallel port
interfacing application (pptest).
Start VC++ IDE, Select 'New' from File menu. Then select ―Win32
Console Application‖ from ―Projects‖, enter project name as ―pptest‖,
then click OK button. Select ―A simple Application‖ and click Finish.
Now open example1.cpp from ―FileView‖ and replace the existing code
with the following code. The test board that you may build to test this
code is shown in problem 4.
#include "stdafx.h"
#include "conio.h"
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
598
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
1. Unidirectional (4 bit)
2. Bi-directional (8 bit)
3. Standard Parallel Port (SPP), also called Type 1
4. DMA Type 3 (used only by IBM)
5. Enhanced Parallel Port (EPP)
6. Enhanced Capability Port (ECP)
By 1994 this development was getting out of hand, and so the IEEE set
down standard modes of operation for the parallel port, in an document
with the title IEEE 1284-1994, standard signaling method for a bi-
directional parallel interface for PC's. Before this time there were no set
standards as to how the parallel port should behave when connected to
devices such as printers, Scanners External Disk Drives etc. The IEEE
defined five modes of operation. These modes take care of the various
types of hardware that have developed over the years since the PC
Computer was released.
599
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The IEEE incorporated the EPP standard into its document 1284-1994,
so, we now have two standards for EPP. There is the original EPP
Standards version1.7, and the IEEE1284 version. Because the differences
were minor, peripherals can be designed to cope with the two variations,
but older peripherals made to the original EPP1.7 standard may not work
with the IEEE1284 ports.
3
Download this DLL from this site: http://www.logix4u.net/inpout32_source_and_bins.zip. This DLL
can be used in Win NT/XP as if it is Win 9X.
601
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
NT/XP/Vista, it will install a kernel mode driver and talk to parallel port
through that driver. Therefore, the code doesn‘t need to be aware of the
operating system under which it is running. The flow chart of the 32-bit
version is given below. The main functions exported from inpout32.dll are:
602
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The interface used by the IDE drives was standardized in 1994 as ANSI
standard X3.221, AT Attachment Interface for Disk Drives. In the
following years, updated versions of the standard were developed, under
the name ATA-1 and ATA-2. In 1994, Western Digital introduced the
Enhanced IDE (EIDE). Other manufacturers introduced their own
variations of ATA-1 such as Fast ATA and Fast ATA-2. The terms IDE
and EIDE have come to be used interchangeably with ATA (now Parallel
ATA). However the terms "IDE" and "EIDE" are at best imprecise. Every
ATA drive is an IDE drive, but not every IDE drive is an ATA drive, as
the term correctly describes
603
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-43(a). Structure of the IDE connector and how it is tied to the HDD.
Fig. 9-43(b). Pin list and cables of ATA connector (40-pin plug and ribbon cables).
604
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The so-called SCSI drives have the drive controllers on board and present
the drive to the host as an array of blocks. There have been several
generations of EIDE drives marketed, compliant with the ATA
specification. Another issue is to refer to the specification version by the
fastest mode supported. For example, ATA-4 supports Ultra DMA modes
0 through 3, the latter providing a maximum transfer rate of 33MB/s.
ATA-4 drives are thus sometimes called Ultra DMA-33 (UDMA-33)
drives. Similarly, ATA-6 introduced a transfer speed of 100 MB/s.
The ATA ribbon cables had 40 wires for most of its history (44 for the
small form-factor 2.5" drives), but an 80-wire version appeared with the
introduction of the Ultra DMA/33 (UDMA) mode. All of the additional
wires in the new cable are ground wires, interleaved with the previously
defined wires to reduce the effects of capacitive coupling between
neighboring signal wires, to reduce crosstalk. This was necessary to
enable the 66 MB/s transfer rate of UDMA4 to work reliably. The faster
UDMA5 and UDMA6 modes also require 80-conductor cables. Though
the number of wires doubled, the number of connector pins and the
pinout remain the same as 40-conductor cables, and the external
appearance of the connectors is identical. Internally the connectors are
different; the 80-wire cables usually come with three differently colored
connectors (blue - controller, gray - slave drive, and black - master drive)
as opposed to uniformly colored 40-wire cable's connectors (all black).
The gray connector on 80-pin cables has pin 28 not connected; making it
the slave position for drives.
605
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-44. Serial ATA (SATA) and parallel ATA (PATA) connectors
The SATA standard defines a data cable using seven pins to supply four
conductors shielded with ground supplied by the other three pins.
606
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Transmit pins are connected to Receive pins on the other side. The SATA
connector is keyed at pin 7. SATA uses a 4 conductor cable with two
differential pairs [Tx/Rx], plus an additional 3 grounds pins and a
separate power connector. SATA runs at 150MBps(SATA/150),
300MBps(SATA II), or 600MBps transfer rates. Faster SATA
implementations are backward compatible with older devices. 8B/10B
encoding used for data transfers. Maximum unshielded cable length is
about 1 meter.
607
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
USB HDDs are external units containing ATA or SCSI disks with USB
ports on the back allowing very simple expansion and mobility.
Furthermore, USB HDDs are cheap and can work with both Macintosh
and Windows operating systems, without the need of driver installation.
The Maximum transfer speed for external SATA (eSATA or SATA 300)
is 300 MB/s. PATA is 133 MB/s and USB2 is around 60 MB/s. As digital
audio and image/video files are growing in use and size you may need to
upgrade to a larger hard drive. The external hard disks, with USB3
Universal Drive Adapter make it easy to transfer data between the old and
new drives.
608
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Table 9-7. Comparison between serial devices and their data transfer rates
609
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
610
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The keyboard return scan codes rather than ASCII code (American
Standard Code for Information Interchange). In addition, all keys are
typematic and generate both make and release scan code. For example,
key 1 produces scan code hex 01 on make and code hex 81 on release.
Normal Characters
Main byte (low byte) ASCII code
Aux byte (high byte) SCAN code
Special Characters
Main byte (low byte) 00(zero)
Aux byte (high byte) Scan-code or special code
The Keyboard controller chip, inside the keyboard, scans the key matrix
and when a key is pressed it sends the Scan Code for the key that was
pressed, to the Keyboard Interface Circuit on the Computers System
Board. Therefore, each key on a PC keyboard has a Scan Code in
addition to the ASCII code associated with it. The following table
indicates the Scan Code for each key on a 101 key PC Keyboard.
10 q Q 11 w W 12 e E
13 r R 14 t T 15 y Y
Scan Lower Upper Scan Base Upper Scan Base Upper
code case case code case case code case case
16 u U 17 i I 18 o O
19 p P 1A [ { 1B ] }
2B \ | 3A Caps Lock na 1E a A
1F s S 20 d D 21 f F
22 g G 23 h H 24 j J
25 k K 26 l L 27 ; :
28 ' " 2B # ~ 1C Enter Enter
2A Left Shift na D5 \ | 2C z Z
2D x X 2E c C 2F v V
30 b B 31 n N 32 m M
33 , < 34 . > 35 / ?
612
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Keypa
4F Keypad 1 End E0,35 Keypad / Keypad / 48 d8
Up Arrow
Keypa
4C Keypad 5 na 50 Keypad 2 Dn Arrow 52 d0
Insert
Keypa
E0,37 Keypad * Keypad * 49 Keypad 9 Pg Up 4D d6
Right Arrow
Keyp
51 Keypad 3 Pg Dn 53 Keypad . Delete 4A Keypad -
ad -
Escap
4E Keypad + Keypad + E0,1C Keypad Enter Keypad Enter 01 Escape
e
3B F1 3C F2 3D F3
3E F4 3F F5 40 F6
41 F7 42 F8 43 F9
44 F10 D9 F11 DA F12
2A,37 Prnt, Scrn na 46 Scroll Lock na
613
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
6. The 8048 pulls the clock line low to inhibit any transmission from the
keyboard.
7. The 8048 pulls the data line low to get the keyboard's attention to the
fact that it wants to transmit.
8. The 8048 release the CLOCK line and waits for the keyboard to pull
the CLOCK line low. When the CLOCK line has been pulled low the
8042 places its first bit of data on the DATA line.
9. The keyboard toggles the clock line and clocks data across on the
DATA line. The 8048 controller will place a new bit on the data line each
time the CLOCK is pulled LOW by the keyboard.
614
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The AT keyboard connector has 5 pins. Pin 3 is called Reset, but it's
reserved, so we can't use it. Open collectors drive the clock and data pins,
so when they are not driven low, they float at 5V. Here is the wiring
assignment for these connectors
i. The XT/ AT (5-Pin) Connector
1 CLK
2 DATA
3 NOT RESET
4 GND
5 +5V
615
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
616
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
617
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-53. Schematic of the PS2 keyboard interface, showing the 8048 controller
618
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Optical & laser Mouse use a light beam that shines out of the bottom of
the device and is reflected back into the Mouse. A reflective Mouse
Mat with a series of lines on it is required, and photo cells inside the
Mouse detect the movement of the device from these lines.
There are other mouse devices that have additional inputs and may report
data differently. One popular extension is the Microsoft Intellimouse,
which supports the standard inputs in addition to scrolling wheel and two
additional buttons. The PS/2 mouse sends data to the host PC using the
following 3-byte frame:
Table 9-10. Mouse data packets
The motion values are 9-bit 2's complement integers, where the most
significant bit appears as a "sign" bit in byte 1 of the movement data
packet. Their value represents the mouse's offset relative to its position
when the previous packet was sent, in units defined by the current
resolution. The range of values that can be expressed is -255 to +255.
620
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
To set this function, load AX=0000H and call the mouse interrupt int 33.
The output of this function (the mouse status) will be written in AX and
BX. AX=0000H means ERROR, AX=FFFFFH means OK; the mouse is
installed. BX will also contain the number of mouse buttons.
The function 01H shows the mouse cursor on the screen. The function
02H hides the mouse cursor.
The function 03H returns position and button status. The output of this
function appears in BX, CX and DX (BX=1 if button is pressed, CX
contains the column position, DX contains the row position). The
function 04H will position the mouse cursor. To call this function put
04H in AX and put the mouse column position in CX and the mouse row
position in DX.
The function 09H redefines the shape of the mouse cursor when the
screen in graphics mode. The mouse shape is input as bitmap in BX and
CX. Also, ES:DX should contain a pointer to cursor bitmap.
621
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-56. Old CRT (analog) monitor and a recent LCD (digital) monitor.
Many modern computers and video cards are fitted with a standard VGA/
SVGA 15-way high-density D socket.
623
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
For the original IBM PC, there were two types of Video Adaptors
(Interface cards), a monochrome character-only display called the Mono
Graphics Adaptor (MDA) and the Color Graphics Adaptor (CGA).
624
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-58. Basic block diagram of monochrome CRT monitor (above) and the
horizontal and vertical scan processes (below).
Basically, the CPU performs most of the work, feeding pixel and text
information to the VGA. So, the VGA card is a simple display adapter
with no processing capability. All the thinking is done by the CPU,
including writing and reading of text, and drawing of simple graphics
primitives like pixels, lines and memory transfers for images. Some
newer accelerator cards include functions for 3D graphics rendering like
polygon shading, coordinate manipulation and texture mapping. Others
provide on-the-fly magnification of video clips so that those MPEG
movies don't appear in a box that's three inches wide and two inches high
on your screen. If we want to record a standard video signal for digital
playback, we have to digitize it at about 640x480 pixels/frame. At a
screen refresh rate of 30 fps (frames per second), and 24-bit/pixel color
depth (16.7 million colors) we get 640x480x30x3=28 MB/s. At that data
rate, a 650MB CDROM would hold only 23 seconds of video! CDROM
reader and hard drive technologies don't allow us to transfer data at such
high rates, so in order to display digital video it is compressed for storage.
626
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Fig. 9-59. Video Graphic Adaptor of an old IBM PC, with DVI and DFP connectors.
The function 01H sets the starting and ending lines of the screen cursor.
The starting line of cursor (0-7) is put in CH and the ending line of cursor
(0-7) is put in CL.. The function 02H sets the cursor position. Put the
display page in BH (in text modes--use 00H in graphic modes).and the
row and column positions in DH and DL, respectively.
The function 05H sets the current display page. To do this put the display
page in AL and call int 10. The function 08H reads a character and its
attributes at current cursor position. To do this put the display page in BH
and call int 10. The output (ASCII code of the character) will be output in
AL and its color attributes will be put in AH.
The function 0CH writes a graphic pixel. In this case you define the
display page in BH, the screen line in DX and the screen column in CX.
Also, the pixel color number should be put in AL, before calling int 10.
Alternatively, one can read a graphics pixel by calling the function 0DH.
This function returns the color number of the specified pixel (row in DX
and column in CX). The return value (pixel color number) will be in AL.
The BIOS video functions are summarized in the following table.
Table 9-13. Summary of BIOS Video functions
628
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
629
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
630
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
The next assignments are recent additions, and are used by some modern
chipsets, which are used on the IBM PC's motherboards. The three
parallel port assignments listed in this table are assigned as LPT1 to LPT3
Table 9-14b. Recent additions for I/O addresses in the IBM PC and compatibles.
631
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
9-12. Summary
In this chapter, we presented the general architecture of IBM PC &
compatible microcomputers, which are based on Intel 80x86
microprocessors. The expansion slots and various busses are introduced.
The IBM Personal Computer, commonly known as the IBM PC, is the
original version and progenitor of the IBM PC compatible hardware
platform. The term personal computer was common currency before
1981. However, because of the success of the IBM PC, what had been a
generic term sometimes meant a microcomputer compatible with IBM
PC. The original PC was an IBM attempt to get into the small computer
market then dominated by the Commodore, Atari 8-bit family, Apple II
and Tandy TRS-80s, and various CP/M machines. The IBM PC model
5150 was introduced on August 12, 1981. Rather than going through the
usual IBM design process, a special team was formed with authorization
to bypass the company restrictions and get something to market rapidly.
The team consisted of twelve people headed by Don Estridge and Chief
Scientist Larry Potter. They developed the PC in about a year. To achieve
this they decided to build the machine with "off-the-shelf" parts from a
variety of different original equipment manufacturers (OEMs). IBM also
sold an IBM PC Technical Reference Manual which included a listing of
the BIOS source code. The following figure shows the block diagram of
an 8088-based PC and its support chips.
The expansion slots have so many pins to power expansion cards and to
connect them with data, address and control busses. The expansion slots
are connected with the microprocessor via a group of signal lines (on the
motherboard of a computer) called the expansion bus. The expansion bus
contains a large number of input/output pins, for data and address as well
as control signals and it is usually operated at a frequency lower than the
microprocessor clock. The efficiency of the expansion bus is expressed in
terms of its bandwidth. The expansion bus speed is calculated using the
following equation:
Bus speed = Bus width (in Bytes) x Bus Clock (in MHz)
Interfacing I/O devices to the IBM PC via ISA bus needs, at least, the
connection with the following pins (among the first 62 pins):
For a relatively long time, most PCs on the market had one or more ISA
slots for backward compatibility; however, most expansion cards are now
built using the PCI and PCI-Express interface. PCI was developed by
Intel, in 1998, but it took some time to get it to work reliably. The PCI-
bus has some attractive features, such as concurrent bus-mastering, a
full burst mode, and a type of pipelining queue that can reduce the
number of potential wait states.
633
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
Hard disk drives are accessed over one of a number of bus types,
including parallel ATA (PATA, also called IDE or EIDE), Serial ATA
(SATA), SCSI, Serial Attached SCSI (SAS), and Fibre Channel. Bridge
circuitry is sometimes used to connect hard disk drives to buses that they
cannot communicate with natively, such as IEEE 1394 and USB Below is
a table showing the buses and slots and their maximum bandwidths:
PCI 132 MB/s
AGP 8X 2,100 MB/s
PCI Express 1x 250 MB/s
PCI Express 2x 500 MB/s
PCI Express 4x 1000 MB/s
PCI Express 8x 2000 MB/s
PCI Express 16x 4000 MB/s
PCI Express 32x 8000 MB/s
USB 2.0 (Max Possible) 60 MB/s
IDE (ATA100) 100 MB/s
IDE (ATA133) 133 MB/s
SATA 150 MB/s
SATA II 300 MB/s
Gigabit Ethernet 125 MB/s
IEEE1394B [Firewire 800] ~100 MB/s
634
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
9-13. Problems
9-1) ISA stands for
a) International American Standard.
b) Industry Standard Architecture.
c) International Standard Architecture.
d) None of the above.
9-2) IDE disk is connected to the PCI BUS using ______ interface.
a) ISA
b) ISO
c) ANSI
d) IEEE
9-3) ________ is an extension of the processor BUS.
a) SCSI BUS
b) USB
c) PCI BUS
d) None of the above
9-4) _____ provides a separate physical connection to the memory.
a) PCI BUS
b) PCI interface
c) PCI bridge
d) Switch circuit
9-5) The key feature of the PCI BUS is
a) Low cost connectivity.
b) Plug and Play capability.
c) Expansion of Bandwidth.
d) Both a and c.
9-6) The DMA differs from the interrupt mode by
a) The involvement of the processor for the operation
b) The method accessing the I/O devices
c) The amount of data transfer possible
d) Both a and c
9-7) The key feature of the PCI BUS is
a) Low cost connectivity.
b) Plug and Play capability.
c) Expansion of Bandwidth.
d) Both a and c.
9-8) The DMA differs from the interrupt mode by
a) The involvement of the processor for the operation
b) The method accessing the I/O devices
c) The amount of data transfer possible
d) Both a and c
635
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
9-9) Describe briefly how to design a simple I/O extension card that can
be inserted into an ISA expansion slot to provide a simple input port of 8-
bits (via dip-switches) and a simple output port of 8-bits (to 8 LED‘s or a
7-segment display). Hint: Refer to the circuit in figure 9-11.
9-10) Write an assembly program that reads the input port (A) and output
the reading to the output port (B) of the 8255 chip in the above card.
9-12) Consider the parallel port test circuit shown below. Show how to
build the project (pptest), and run the resultant pptest.exe under DOS such
that when you set the dip switches to "11111111", the LED1 to LED8 in
the hardware will glow.
9-13) Design a parallel printer ISA interface card on the basis of the
8255, to drive a stepper motor, via the key-board. Use a 3-to-8 decoder
(74137) for assigning the appropriate port address.
Hint: Let the arrow keys ( and ) rotate the motor right and left, and
the space bar to start and the escape to stop the motor.
636
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
9-14- Bibliography
[2] Peter Norton, Inside the IBM PC, Brady, New York, 1986.
[7] http://www.epanorama.net/
637
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9
638
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 10
Microprocessor Support
Chips & PC Chipsets
Contents
10-1. Introduction
10-2. DMA Controller, Intel 8237 Chip
10-3. UART Chip, Intel 8250A Chip
10-4. USART Chip, Intel 8251A Chip
10-5. UART Chip, Intel 16550 Chip
10-6. Programmable Interval Timer (PIT), Intel 8253 Chip
10-7. Programmable Peripheral Interface (PPI), Intel 8255 Chip
10-8. Programmable Interrupt Controller (PIC), Intel 8259 Chip
10-9. Keyboard / Display Controller, Intel 8279 Chip
10-10. Bus controller, Intel 8288 Chip
10-11. Bus Arbiter, Intel 8289 Chip
10-12. CRT Controller, Intel 8275 & Motorola 6845 Chips
10-13. Graphic Processing Units (GPU) & Graphic Accelerators
10-14. IBM PC Chipsets
10-15. Case Study: Intel 82845 Chipset
10-15.1. North-Bridge Chipset (MCH)
10-15.2. South-Bridge Chipset (ICH)
10-16. Intel DG965 Chipsets
10-17. AMD 690G Chipsets
10-18. Intel 5-Series (Core i7) Chipsets
10-18. Intel 6- Series and 7-Series Chipsets
10-19. Apple PC Chipsets
10-20. Intel Z170 (Skylake) Chipset
10-21. Summary
639
640
Microprocessor Support
Chips & PC Chipset
10-1. Introduction
In order to build a complete microprocessor system, various support chips
are needed in conjunction with the microprocessor. When Intel
introduced its 80186 processor in 1982, which was designed as an
embedded solution, Intel built the clock generator, DMA channels,
interrupt controller, timers, and wait-state generator into the CPU.
However, since the introduction of 80286, the opinion prevailed to
integrate such functions in a chipset. In 1999, Intel launched a family of
compatible 800-series chipsets. The first of these was the 810 chipset.
641
Fig. 10-1. Bock diagram of the IBM PC, showing the 8088 CPU and support chips.
The system chipset is the logic circuits that form the intelligence of the
computer. It usually controls data transfers between the CPU, cache, and
system buses. Since data flow is such a critical issue, the chipset is one of
the important components that have a major impact on the computer
performance. In this chapter we describe the famous support chips for
x86 CPUs and IBM PC.
There are additional signals with the names HRQ (Hold Request), HLDA
(Hold Acknowledge), EOP (End of Process), and the bus control signals
MEMR (Memory Read), MEMW (Memory Write), IOR (I/O Read), and
IOW (I/O Write).
The 8237 DMA is known as a fly-by controller. This means that the data
being moved from one location to another does not pass through the
DMA chip and is not stored in the DMA chip. Subsequently, the DMA
can only transfer data between an I/O port and memory, but not between
2 I/O ports or 2 memory locations.
10.3. UART, Intel 8250A Chip
The IBM PC uses the Intel 8250A UART (universal asynchronous
receiver and transmitter) as the interface unit, for serial communication
with the outside world. The Intel 8250A UART is a special interface unit
that converts parallel information to serial information and vice-versa.
The UART is used to connect the IBM PC to serial devices like
MODEM’s (Modulator/Demodulators) and FAX/MODEM cards. The
Intel 16550 UART was the successor of 8250A, for recent PC’s. Over the
years, since the introduction of the DOS computer, three types of UARTS
have been used in this hardware. The first was the 8250 chip, this was
followed by 8250A, then the 16450 chip, and then the 16550 chip. The 3
chips are pin compatible, but different in performance. The following
table indicates the performance of these various chips.
Table 10-2. Max data rate of famous UART chips.
The UART chip has a total of 12 different registers that are mapped into 8
different Port I/O locations. Yes, you read that correct, 12 registers in 8
locations. Obviously that means there is more than one register that uses
the same Port I/O location, and affects how the UART can be configured.
In reality, two of the registers are really the same one but in a different
context, as the Port I/O address that you transmit the characters to be sent
out of the serial data port is the same address that you can read in the
characters that are sent to the computer. Another I/O port address has a
different context when you write data to it than when you read data from
it... and the number will be different after writing the data to it than when
you read data from it. More on that in a little bit.
643
One of the issues that came up when this chip was originally being
designed was that the designer needed to be able to send information
about the baud rate of the serial data with 16 bits. This actually takes up
two different registers and is toggled by what is called the Divisor Latch
Access Bit (DLAB). When the DLAB is set to "1", the baud rate registers
can be set and when it is "0" the registers have a different context. Does
all this sound confusing? Maybe, but let us take it step by step. The
following is a table of each of the registers that can be found in a typical
UART chip:
The "x" in the DLAB column means that the status of the DLAB has no
effect on what register is going to be accessed for that offset range.
Notice also that some registers are Read only. If you attempt to write data
to them, you may end up with either some problems with the modem
(worst case), or the data will simply be ignored (typically the result).
As mentioned earlier, some registers share a Port I/O address where one
register will be used when you write data to it and another register will be
used to retrieve data from the same address. Each serial communication
port will have its own set of these registers. For example, if you wanted to
access the Line Status Register (LSR) for COM1, and assuming the base
I/O Port address of 53F8, the I/O Port address to get the information in
this register would be found at 03F8 + 05 or 03FD. Some example code
would be like this:
644
The Interrupt Enable Register allows us to control when and how the
UART is going to trigger an interrupt event with the hardware interrupt
associated with the serial COM port. If used properly, this can enable an
efficient use of system resources and allow you to react to information
being sent across a serial data line in real-time. The point here is that you
can use the UART to let you know exactly when you need to extract
some data. This register has both read and write access. The following is
a table showing each bit in this register and what events that it will enable
to allow you check on the status of this chip:
Bit Notes
7 Reserved
6 Reserved
5 Enables Low Power Mode (16750)
4 Enables Sleep Mode (16750)
3 Enable Modem Status Interrupt
2 Enable Receiver Line Status Interrupt
1 Enable Transmitter Holding Register Empty Interrupt
0 Enable Received Data Available Interrupt
The Received Data interrupt is a way to let you know that there is some
data waiting for you to pull off of the UART. This is probably the one bit
that you will use more than the rest, and has more use.
The Transmitter Holding Register Empty Interrupt is to let you know
that the output buffer (on more advanced models of the chip like the
16550) has finished sending everything that you pushed into the buffer.
This is a way to streamline the data transmission routines so they take up
less CPU time.
645
The Receiver Line Status Interrupt indicates that something in the LSR
register has probably changed. This is usually an error condition, and if
you are going to write an efficient error handler for the UART that will
give plain text descriptions to the end user of your application, this is
something you should consider.
The Modem Status Interrupt notifies you when something changes with
an external modem connected to your computer. This includes the
telephone bell ringing, that you have successfully connected to another
modem (Carrier Detect has been turned on), or that somebody has hung
up the telephone (Carrier Detect has turned off). It can also help you to
know if the external modem or data equipment can continue to receive
data (Clear to Send). Essentially this deals with the other wires in the RS-
232 standard other than strictly the transmit and receive wires. The other
two modes are strictly for the 16750 chip, and help put the chip into a low
power state for use on battery-powered laptop computers or embedded
controllers. On earlier chips you should treat these bits as "Reserved", and
only put a "0" into them.
646
IRQ3
D0 74LS244 8250
: D0 INTRPT DRIVER ٌ
: : SOUT TxD
:
: RTS R
D7 D7 DTR S
DIR G 2
SIN RxD
IOR CTS 3
IOR 2
IOW IOW DSR
RLSD
A0 A0 RI
A1 A1
A2 A2
A3 Cs2 OUT1
:
CS1 OUT2
A9 CS0
AEN MR Xin Xout
18.4322
MHz Divider
XTL
/10
Fig. 10-2(a). The 8250 UART chip block diagram, and how it is connected in a PC.
647
Fig. 10-2(b). Pin-out diagram of the 8250 and 16550 UART Chips
Fig. 10-3(a) Pin-out diagram of the 8253/8254 programmable interval timer (PIT).
649
Fig. 10-3(b). Functional block diagram of the 825/54 programmable interval timer.
D7 D6 D5 D4 D3 D2 D1 D0
SC1 SC0 RW1 RW0 M2 M1 M0 B/BCD
Channel ID Read/Load Mode Count Mode Selection 0 = Binary
1 = BCD
Fig. 10-3(c). Bits of the control register of the 8254 programmable interrupt
timer/counter (PIT).
650
The 8255 chip contains four registers. One is a control register and three
data register, one for each of its three ports PA, PB and PC). Not
necessarily all ports are doing I/O. When the CPU executes an IN or
OUT instruction, the two address bits (A0-A1) specify which of the four
registers will be accessed. Thus a connection between the CPU and the
I/O device can be established. As we have explained in chapter 8, the
8255A has three operation modes:
1- Basic I/O mode: All the 3 ports can be programmed as input or output
ports. When CPU does an output, a value is stored to the register, the
8255 latches it (holds it constantly), and sends it to its output pins until
the CPU writes a new data to the register (CPU won't write new data until
the I/O device receives the previous item). The PPI does not latch its
input; the CPU can only read a value while the external device sends it.
So, the CPU reads a value from the ports data register.
651
652
The 8259 interrupt controller chip can perform the following functions:
653
654
Fig. 10-5(c). Connection of the 8259 chip to the 8088 microprocessor in IBM PC.
656
The 8259 has many modes of operation. In the simplest mode, the chip is
programmed by an initialization word of 3 bytes, in much the same way
as you program the 8255 chip. This 3-byte word assign the interrupt
request lines (IRQ’s) to certain interrupt vectors and define the triggering
direction of interrupt signals (positive or negative edge). After
initialization, the 8259 becomes ready to accept up to 8 interrupt requests
and to raise its INT output pin. The IRQ table will change a little from an
IBM PC to another, according to the installed peripheral devices. The
table below depicts the IRQ's of a recent IBM PC.
657
658
Fig. 10-6(c). Circuit diagram illustrating the use of the 8279 controller to drive 8
multiplexed 7-segment display units.
Fig. 10-6(d). Circuit diagram illustrating the use of the 8279 controller to drive a 8x8
matrix keypad.
659
For instance, the 8288 uses the status signals S0, S1, S3 (pins 26-28 of
8086/8088 micros in its maximum mode) to generate I/O control signals
(ALE, MEMRC, MEMWC, IORC, IOWC, INTA, and other signals),
which facilitate the use of other support chips.
660
Table 10.6. Input status and output control signals of the 8288 chip.
661
Fig. 10-8. Connection of the 8289 bus arbiter to the 8088 in maximum mode.
662
Two bytes are fetched from the display buffer in 553ns, providing a data
rate of 1.8M byte/sec. The monitor adapter supports 256 different
character codes. An 8K-byte character generator contains the fonts for the
character codes.
The CGA has three modes available within the graphics mode. They are
low-resolution color graphics, medium-resolution color graphics, and
high-resolution color graphics. However, only medium- and high-
resolution graphics are supported in ROM. The following table
summarizes the three modes.
663
664
Fig. 10-10(a).. Interfacing the 8088 CPU to peripheral devices via the 8742 UPI.
665
The term GPU was popularized by Nvidia in 1999, who marketed the
GeForce 256 as "the world's first GPU. Many companies , such as Intel,
Nvidia, AMD and ATI have produced GPUs under a number of brand
names. GPUs are used in embedded systems, mobile phones, personal
computers, workstations, and game consoles.
The GPUs of the most powerful class typically interface with the
motherboard by means of an expansion slot such as PCI Express (PCIe)
or Accelerated Graphics Port (AGP).
667
668
Fig. 10-11. Block diagram of the 82485 chipset, showing the North-Bridge (MCH)
and the South-Bridge (ICH2) chipsets, as connected with the P 4 and other devices.
669
The North-Bridge chipset connects the CPU front-side data bus1 with
main memory as well as other high-speed devices and ports, like the
AGP. The PCI bus is also connected to the north-bridge chipset as a
mezzanine stage, to link different types of busses. As shown, in figure 10-
12, the PCI bus is connected to the rest of the system via the South-
Bridge chipset. The south bridge chipset can convey signals between the
PCI bus and the ISA bus as well as other slow devices. For instance, the
floppy disk drive (FDD), the enhanced IDE (EIDE) channels, the
keyboard, the mouse and one or more serial ports may be connected via
this chipset. However, in most cases the system chipset does not integrate
all of the circuitry needed by the motherboard.
The Super I/O chip, which handles the serial ports, parallel port,
USB ports, floppy disks, and sometimes the IDE hard disks.
SCSI controllers (for SCSI hard disks & devices) and those found
in video, sound, and network cards.
1
As we stated before, the recent microprocessors have wider external data bus (called front-side bus) to
connect them to main memory.
670
The LPC (Low Pine Count) controller allows connecting the I/O
controller to South-Bridge. It controls, ports PS2, LPT, IR ports, the
diskette controller, USB controller (4 ports), sound controller AC97 (6
channels), and a system management. As shown in figure, the connection
inter-bridge is quadrable data rate (QDR), and allows the communication
via a bus owner called "Hub Interface". A request to allocate 256k of
memory for the graphics card to access via the AGP bus could result in
several small, non-contiguous chunks of main memory being allocated.
bus (SMBUS) controller. Figure 10-14 depicts the reference design for
the motherboards that are based on i845D. To make this appear as one
256k, contiguous piece of memory to the graphics card, the chipset has
something called a Graphics Address Re-Mapping Table (GART). The
GART maps a linear range of virtual memory addresses to multiple, 4k,
physical addresses in main memory. The amount of memory for
remapping by the GART is 671 often determined by a setting in the
672
673
Fig. 10-16. Block diagram of the Intel DG965 chipset, showing the North-Bridge
(MCH) and South-Bridge (ICH8) chipsets.
The North Bridge ensures proper work with any contemporary Socket
AM2 processors. It is also responsible for work with PCI-Express
expansion cards and High Definition Audio codec. It also contains
integrated graphics adapter. SB600 South Bridge supports PCI bus, 10
USB 2 ports, 4 Serial ATA (SATA) channels (with RAID 0, 1 and 10
arrays) and one IDE.
674
The key feature of the AMD 690 chipset family is the powerful integrated
graphics core from the Radeon X1200 Series. There are two models in
this family: AMD 690G and 690V. The top 690G solution features.
Radeon X1250 graphics core and supports HDMI output, while the AMD
690V features an integrated
Fig. 10-17. AMD G690 chipset, connected with AMD processors and other devices.
The forefather of the Radeon X1250 was the Radeon X700 released in
2004. The major enhancement is the support of Avivo (advanced features
for media content processing, including HD-video). The clock speed of
Radeon X1250 is 400MHz, and up to 1GB of system memory can be
allocated for the needs of the graphics subsystem. AMD 690G was the
industry first chipset supporting High-Definition Media output (HDMI-
Out). It has two monitor connectors, including one DVI. Moreover, AMD
690G supports SurroundView function that allows connecting four
monitors to the system with an additional discrete PCIe video card. The
following table recapitulates some of the most recent chipsets for Intel
and AMD x86 machines and their characteristics.
675
676
677
Fig. 10-19. Difference between Nehalem processors (such as Corei7) and previous
Intel microprocessors and their chipsets
10-20. Case Study: Intel's 6- And 7-Series Chipsets (Core i9) Chipset
The arrival of Sandy Bridge-based Core CPUs coincided with Intel's
Couger Point chipset launch in 2011. The most feature solution was
Intel's Z68, which boasted eight PCIe 2 lanes, two SATA 6Gb/s ports,
four SATA 3Gb/s ports, and 14 USB 2.0 ports. It also featured support
for overclocking, RAID 0/1/10, and the ability to split PCIe connectivity
up between multiple GPUs. Later, with the introduction of the Panther
platforms and Ivy Bridge processors, Intel integrated a USB 3.0 controller
into its PCHes. All 7-series chipsets can support up to four USB 3.0 ports
as a result. Ivy Bridge CPUs also saw the introduction of PCIe 3.0.
678
Fig. 10-20. Apple Xserve chipset with the Xeon microprocessor modules.
679
The chipset will feature up to 20 PCIe Gen 3 lanes, 6 SATA Gen 3 ports,
10 USB 3.0 ports and a total of 14 total USB ports (USB 3.0 / USB 2.0),
up to 3 SATA Express capable ports, up to 3 Intel RST capable PCI-e
storage ports which may include x2 SATA Express or M.2 SSD port with
Enhanced SPI and x4/x8/x16 capable Gen 3 PCI-Express support from
the processor. Aside from that, we know that the Skylake processors
would be compatible with the latest LGA 1151 socketed boards
680
681
10-23. Summary
In this chapter we recapitulated the most famous support chips, in the
IBM PC. In particular, we described the system chipset, which is a set of
integrated computer support chips. Thus a chipset is the logic circuit that
collects the intelligence of the PC motherboard. Chipsets control data
transfers between the processor, cache, system buses, basically
everything inside the PC. Since data flow is such a critical issue, the
chipset is one of the most important components in the PC. The
following figure depicts the general structure of the PC chipset, which is
composed of two basic ICs, namely: the north-bridge (IOH) and the
south-bridge (ICH)
The north bridge (IOH) contains the fast system I/O interface circuits and
the south bridge (ICH) usually handles the slower transfer devices. Some
manufacturers combined the two ICs in a single monolithic chipset, but
the solution of two ICs (south and north bridges) is still widely adopted in
the PC industry.
682
Pentium 4 Chipsets
683
Core2 Chipsets
All Core2 Duo chipsets support the Pentium Dual-Core and Celeron
processors based on the Core architecture. Support for all NetBurst based
processors was officially dropped starting with the P35 chipset family.
DDR3
800/ 1 PCI-E
Core2Quad/ 800/1066 16 G
B43 ICH10D 2008 45 1066/ 16× 2.0
Core2 Duo DDR2 B
1333 MAX 4500
667/800
684
LGA 1155
Chipsets supporting LGA 1155 CPUs.
PCI
Bus Bus TDP
Chipset Date Express SATA USB
Interface Speed W
lanes
6 PCI-E Rev 2.0, 10
H61 2011 DMI 2.0 4 GB/s 3Gbit/s, 4 Ports 6.1
2.0 Ports
, 8 PCI-E 6 Gbit/s, 2Ports & Rev 2.0, 14
P67 DMI 2.0 4 GB/s 6.1
2011 2.0[42] 3Gbit/s,4Ports Ports
685
LGA 1156
Chipsets supporting LGA 1156 CPUs.
Chipset Date Bus Interface Bus Speed PCI Express lanes SATA USB TDP W
8 PCI-E 2.0 3Gbit/s, Rev2.0,
P55 2009 DMI 2 GB/s 4.7
at 2.5 Gbit/s 6 Ports 14 Ports
6 PCI-E 2.0 3Gbit/s, Rev2.0,
H55 2010 DMI 2 GB/s 5.2
at 2.5 Gbit/s 6 Ports 12 Ports
8 PCI-E 2.0 3Gbit/s, Rev2.0,
H57 2010 DMI 2 GB/s 5.2
at 2.5 Gbit/s 6 Ports 14 Ports
8 PCI-E 2.0 3Gbit/s, Rev 2.0,
Q57 2010 DMI 2 GB/s 5.1
at 2.5 Gbit/s 6 Ports 14 Ports
Bus TDP
Chipset Date Bus Speed PCI Express lanes SATA USB
Interface W
36 PCI-E 2.0
Up to 3 Gbit/s, Rev2.0,
X58 2008 QPI at 5 Gbit/s (IOH); 28.6
25.6GB/s 6 Ports 12Ports
6 PCI-E 1.1 (ICH)
6 Gbit/s,
2 Ports & Rev2.0,
X79 2011 DMI 2.0 4GB/s 8 PCI-E 2.0 7.8
3 Gbit/s, 14Ports
4 Ports
686
10-24. Problems
10-1) Consider the parallel port application shown below. Show how to
build the project under DOS such that when you turn on the relay switch,
the appliance is connected to the AC mains.
OUT DX,AL
AGAIN:
MOV AL,PHASEC
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP:
LOOP UP
MOV AL,PHASEB
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP1:
LOOP UP1
MOV AL,PHASED
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP2:
LOOP UP2
MOV AL,PHASEA
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP3:
LOOP UP3
JMP AGAIN ; REPEATE OUTPUT SEQUENCE
INT 03H
END START
MOV AL,PHASEC
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP:
LOOP UP
MOV AL,PHASEA
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP1:
LOOP UP1
MOV AL,PHASED
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP2:
LOOP UP2
MOV AL,PHASEB
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP3:
LOOP UP3
JMP AGAIN ; REPEATE OUTPUT SEQUENCE
INT 03H
END START
Procedure
1. Connect power supply 5V & GND to both microprocessor trainer kit &
Stepper motor interfacing kit.
2. Connect data bus between microprocessor trainer kit & Stepper motor
interfacing kit.
3. Enter the program to rotate Stepper motor in clockwise &
anticlockwise.
4. Execute the program by typing:
GO E000:00C0 ENTER (for clockwise),
GO E000:0030 ENTER (for anticlockwise).
5. Observe the rotation of stepper motor.
689
10-25. References
[2] Peter Norton, Inside the IBM PC, Brady, New York, 1986.
690
Microprocessor Selection
Guide
Contents
11-1. Intel Microprocessors Selection Guide
11-2. AMD Microprocessors Selection Guide
11-3. SPARC Microprocessors Selection Guide
11-4. Processor Performance Factors
11-5. Benchmarks
11-6. Microprocessor Packages
11-7. Processor Sockets
11-8. Processor Bus Speed
11-9. Overclocking
11-10. Processor Supply Voltages
11-11. CPU Cooling
11-12. Summary
11-13. Problems
691
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
692
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Microprocessor
Selection Guide
11-1. Intel Microprocessors Selection Guide
At the time of writing the first version of this book, the most recent
release of Intel 80x86 microprocessors was Intel® Pentium 4. However,
the Intel Core2 processors followed the Pentium 4 and became the
vedette of more recent PC’s and other computing platforms. The Pentium
4 processors are based on the Intel Netburs microarchitecture and still
maintains the tradition of compatibility with IA-32 software.
Table 11-1. INTEL Processors, from 1971 to 2015. The indicated data bus width is
the internal data bus
FAMILY
694
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
695
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
696
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
L2-
Processor No. of Clock speed FSB
Processor No. Tech. Cache
name Cores (GHz) (MT/s)
(MB)
520, 530 1.6, 1.73
Celeron M 520, 530, 540, 550, 1.6, 1.73, 533 1
65
560 1 1.86, 2, 2.13
nm
Celeron M
523 0.933 533 1
ULV
Core 2 Solo 65
U2100, U2200 1 1.06, 1.2 533 1
ULV nm
Pentium Dual T2310, T2330, 65 1.46, 1.6,
2 533 1
Core T2370, T2390 nm 1.73, 1.86
Core 2 Duo U7500, U7600, 1.06, 1.2,
533
ULV U7700 1.33
T5300 1.73 533
T5250, T5450,
65 1.5, 1.67,
T5550, T5750, 2 2
nm 1.83, 2.0, 2.1 667
Core 2 Duo T5850
T5500, T5600 1.67, 1.83
T5270, T5470, 1.4, 1.6, 1.8, 800
T7100, T7250 2.0 MT/s
667
L7200, L7400 1.33, 1.5
Core 2 Duo MT/s
4
LV L7300, L7500,
1.4, 1.6, 1.8 800
L7700
T5200 65 1.6 533
2 2
T5500, T5600 nm 1.67, 1.83
T7200, T7400, 667
Core 2 Duo 2, 2.16, 2.33
T7600
4
T7300, T7500, 2, 2.2, 2.4,
800
T7700, T7800 2.6
T8100, T8300 45 2.1, 2.4 3
Core 2 Duo 2 800
T9300, T9500 nm 2.5, 2.6 6
697
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Clock L2-
Processor Processor No. of FSB
Tech. speed Cache
name No. Cores (MT/s)
(GHz) (MB)
Core 2 45
X9000 2 2.8 800 6
Extreme nm
32
Core i7 I7-2960XM 4 2.7-3.7 5000 8
nm
Clock L2-
Processor No. of FSB
Processor No. Tech. speed Cache
name Cores (MT/s)
(GHz) (MB)
Dual-Core
3040, 3050 65 nm 2 1.86, 2.13 1066 2
Xeon
3040, 3050 1.86, 2.13 2
1066
Dual-Core 3060, 3070 2.4, 2.67
65 nm 2
Xeon 3065, 3075, 2.33, 2.67, 4
1333
3085 3
Dual-Core
E3110 45 nm 2 3.0 1333 6
Xeon
Dual-Core 5128, 5138 1.86, 2.13 1066
Xeon LV 5148 2.33 1333
5110, 5120 65 nm 2 1.6, 1.86 1066 4
Dual-Core
Xeon 5130, 5140, 2, 2.33,
1333
5150, 5160 2.67, 3
E5205 1.86 1066
Dual-Core
X5260 45 nm 2 3.33 1333 6
Xeon
X5272 3.4 1600
Quad-Core X3210, X3220, 2.13, 2.4,
65 nm 4 1066 8
Xeon X3230 2.67
698
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Clock L2-
Processor No. of FSB
Processor No. Tech. speed Cache
name Cores (MT/s)
(GHz) (MB)
Quad-Core L5310, L5320 1.6, 1.86 1066
Xeon LV L5335 2 1333
E5310, E5320 1.6, 1.86
65
E5330, E5340, 4 2.13, 2.4, 1066 8
Quad-Core nm
E5350 2.67
Xeon
E5335, E5345, 2, 2.33,
1333
X5355, X5365 2.67, 3
E5405, E5410, 2, 2.33, 2.5,
E5420, E5430 2.67
1333
Quad-Core E5440, E5450, 45 2.83, 3, 3,
4 12
Xeon X5450, X5460 nm 3.16
E5462, E5472, 2.8, 3, 3,
1600
X5472, X5482 3.2
Quad-Core
L7345 1.86 8
Xeon LV
E7310, E7320 65 1.6, 2.13 4
4 1066
Quad-Core nm
E7330 2.4 6
Xeon
E7340, X7350 2.4, 2.93 8
Note that the Pentium Pro, Pentium II, Pentium III, and Pentium
III Xeon processors are belonging to the 32-bit Intel Architecture
(IA-32) processors based on the P6 Microarchitecture. The
Pentium 4, Pentium D, and Pentium processor Extreme Editions
are based on the Intel NetBurst Microarchitecture. Most early
Intel Xeon processors are also based on the Intel NetBurst
Microarchitecture. The Intel Core and Xeon processors are based
on an improved version of Pentium Microarchitecture. The Intel
Xeon processor 3x00 and 7x00 series, Pentium dual-core, Intel
Core2 Duo are based on Intel Core Microarchitecture. While Intel
Core i7 are based on Nehalem Microarchitecture
699
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
700
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Note that in 2003, AMD extended the Intel 32-bit architecture (IA-32) to
64 bits, variously called x86-64 or AMD64. The AMD Opteron
processors, the AMD Athlon processors, the AMD Phenom processors,
AMD Turion 64 mobile technology comprise the AMD64 family, as
follows:
703
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
704
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Having failed long ago on the desktop and still being insignificant in the
overall notebook market (despite the availability of technically
impressive products) SPARC - unlike Intel architecture - is best viewed
solely as server processor architecture.
705
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
There are currently eight product families which make up the ARM
processor range:
ARM7 processor family
ARM9 processor family
ARM9E processor family
ARM10E processor family
ARM11 processor family
Cortex processor family
SecurCore processor family
OptimoDE Data Engines
Further implementations of the ARM architecture are available from our
Partners such as the Intel XScale microarchitecture
706
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Clock speed can be used to compare processors only if they are identical
internally. This means you can only use clock speed to compare the
performance of otherwise identical processors. A Pentium 200 is in fact
20% faster than a Pentium 166. But it isn't 20% faster than a Pentium
with MMX 166, because of the improvements in the latter architecture.
707
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
11-6. Benchmarks
Benchmarks are standard evaluation programs, which can be run on
different computers to give a measure of their performance. Processors
are frequently benchmarked, in the hopes of coming up with a single
number that can capture the value of the CPU and let it be easily
compared to others. There is a difference between benchmarking
processors and benchmarking whole systems. System benchmarking
attempts to evaluate "real world" performance of a whole system, which
involves much more than just the processor. Processor benchmarks show
isolated performance of processors relative to one another. Even in just
looking at processors, there are many types of benchmarks, which can
result in many different scores.
iCOMP 2.0: This is Intel's revised iCOMP benchmark, used for Pentium
and later processors. It is also an amalgam of several other benchmarks
(including SI32 and CPUmark32). It focuses more on 32-bit performance
than the original iCOMP index, and also partially incorporates a
multimedia benchmark. The following table depicts the benchmark of
several processors. The benchmark values were obtained from many
different sources. Where possible, we used official values from
manufacturers, after cross-referencing with independent numbers. Values
with a tilde "~" in front of them are extrapolated or approximated.
Table 11-5. Benchmarks of different processors.
709
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
-25 122 -- 54 !? --
80486DX -50 249 -- 109 !? --
-133 ~610 ~67 288 18 ~160
-100 ~610 ~67 264 ~16 ~150
AMD 5x86 -120 ~735 ~81 316 19 ~180
Cyrix 5x86 60 510 51 190 ~16 ~120
66 567 57 211 ~18 ~140
Pentium 75 610 67 237 23 181
100 815 90 317 30 243
133 1110 111 421 36 300
166 1308 127 529 40 343
233 ~2210 203 ~890 62 460
200 -- 220 -- 90 553
Pentium 200 -- ~240 -- 98 611
with MMX 233 -- 267 -- ~115 ~640
300 -- 332 -- !? !?
AMD K5 333 -- 366 -- !? !?
233 -- !? -- ~91 !?
Pentium 266 -- !? -- ~100 !?
Pro 233 -- 267 -- 115 640
300 -- 332 -- !? !?
Pentium II 166 -- !? -- 73 420
233 -- !? -- 91 !?
AMD K6 266 -- !? -- 100 !?
The following benchmarks are more recent, and are currently used to
compare the modern microprocessor systems.
710
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
711
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
DIP (dual inline package) QFP (quad flat pack) FC - PGA (pin grid array)
2
and their variants (P=Plastic, C=Ceramic, etc)
712
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Socket 8 is a 387 pin socket used by the various Intel Pentium Pro and
Pentium II overdrive processors. The pins are arranged in a 24X26 matrix
that is more rectangular than the socket 7 voltages used with the socket 8
fall in the range of 3.1V to 3.3V. The Intel Pentium Pro 150 to 200 MHz
and Pentium II overdrive 300 to 333 MHz use this socket.
Socket 370 Once Intel changed designs from socket to slots, people were
used to the sockets designs ease of use and didn’t want to use a slot. So
Intel developed the socket 370 (named for the fact that it use as 370 pins).
The socket 370 socket type support Intel Pentium II and III as well as
several Celeron processors.
Socket A is one of the most popular sockets today. This socket consists
of 462 pins and looks similar to other socket types. The socket A
primarily support AMD Athlon, Duron and Athlon Xp processors
Socket 423 came with the introductions of the Pentium4. The socket 423
was introduced alongside the Intel850 motherboard chipset..
Socket 478 is used with Pentium4 processors. The socket 478 is similar
in appearance to the socket 423 but it support more pins for extra
capabilities.
713
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Fig. 11-9. Socket 370 and Slot 1, their locations on the motherboard
The next move of Intel was away from Sockets to an edge connector
configuration, called Slot 1. This was first used with the Pentium II. All
versions of the Pentium II were packaged on a special daughter-board
that plugs into a card-edge processor slot on the motherboard. The
daughter-board is enclosed within a rectangular box called a Single Edge
Contact (SEC) cartridge. Slot1 is electrically identical to Socket 8 but is
an edge connector, rather than a Pin Grid Array (PGA) socket. Actually,
Pentium II required a 242-pin Slot1, while Xeon processor used a 330-pin
slot called Slot 2. Intel refers to Slot1 and Slot2 as SEC-242 and SEC-
330 in some of their technical documentation. The daughter-board has
mounting points for the Pentium 2 CPU itself plus various support chips
and cache memory chips.
714
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Slot1: With the development of the Pentium II, Intel designed the Single
Edge Contact Cartridge (SECC) to contain its processors. The SECC has
the processor on a small circuit board with 242 small fingers or leads.
This board is then inserted into a special slot on the motherboard called a
slot1 connector. The slot 1 connectors support SECC processors
including the Intel, Celeron, Pentium II, and some Pentium III’s.
Slot2: The Slot2 specified a 330 lead edge connector for the processor
card. It functions similarly to Slot1 setup but more leads on the connector
between the card and the motherboard. It allows the CPU to communicate
with the L2-cache at the CPU’s full clock speed thus enhancing
performance. Sot2 was primarily designed for use in workstations with
SECC Pentium III and Xeon processors.
The Intel LGA 2011 (also called Socket R) replaced Intel's LGA 1366
(Socket B) and LGA 1567 in the high-end desktop and server platforms.
The socket was released on 2011 and supports Sandy Bridge-E processors
with 4 memory channels of DDR3-1600 as well as 40×PCIe 2 or 3 lanes.
715
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Table 11-6 identifies some CPU interface sockets from the time of Intel's
socket1.
Feature Slot 1 Slot 2 Socket 370 Socket 423 Socket 478 Socket T
No. of pins 242 330 370 423 478 775
Voltage 3.3/2.5 3.3/2.5 3.3/2.5 1V -1.85V 1V -1.85V 1V -1.85V
CPU Celeron, Pentium 2 Celeron, Pentium 4 Pentium 4 Pentium 4,
Pentium 3 Pentium 3 Pentium 3 Core, Core2
Speed < 2.8 GHz > 3 GHz
As for the latest AMD processors, there exist a variety of sockets for
different processor. For instance, all K7-based sempron processors are
compatible with socket 462 (Socket A). About the same time the slot1
connector began to gain use AMD developed its own card and slot
processor (Slot A). The slot A looks the same as a slot1 and they are
physically same size. The slot A allows for a higher bus rate than a
socket7 and its used primarily with AMDK7 processor family. Also K8-
based sempron processors use Socket 754. Figure 11-9 shows some
sockets for AMD processors. Table 11-6 depicts some AMD sockets.
In 2006, AMD released Socket AM2 for desktop processors. AM2 (940
pin) is announced as a replacement for Socket 754 and Socket 939. AM2
supports AMD Athlon 64, Sempron, Optron and Phenom processors.
Socket AM2 is a part of AMD's generation of CPU sockets, along with
Socket F for servers and Socket S1 for mobile computing. The Socket
AM3 (938-contact PGA) is intended for single AMD processors, with
support for DDR3-SDRAM and separated power lanes..
716
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Fig. 11-10. Photograph of Intel the754 socket.(), AM2 socket (AMD) and 2011 socket
717
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Note: Overclocking
Overclocking will void the warranty on the parts being overclocked.
Doing so may also cause system instability, and may also cause damage
to components and data. Be careful and cautious when overclocking.
The CPU's clock speed is the FSB clock speed (base, not effective speed)
times the CPU's multiplier. On most new CPUs, the multiplier is locked,
so you will have to adjust the FSB clock speed. The FSB is not adjustable
on a few motherboards, and many OEM systems. The FSB and
multiplier, if not locked, are adjustable from within the BIOS. Note that
upping the FSB clock speed also increases the clock speed of many other
components, including RAM. When increasing the FSB clock speed, only
do so in small increments of a few MHz at a time. After you do this, boot
up your computer to make sure it works. If your computer successfully
boots, increase the FSB some more. If it won't boot, lower the FSB until
your computer properly boots up. Repeat until you have the highest
setting with which your computer will boot up. Next test your OS to
make sure it is stable with a burn application, or any application.
718
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
719
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Note that some of the cheaper pads can melt in unexpected heat and may
cause problems and potentially even damage if you are overclocking. In
either case, thermal paste is usually more effective, just harder to apply. If
you are planning a long term installation a thermal pad is suggested. Non-
conductive thermal pastes made up of silicon are the cheapest and safest.
Silver-based thermal pastes sometimes perform better than normal
thermal pastes, and carbon-based ones perform better still. Some low-
noise CPU cooling fans require special mounting hardware on the
motherboard. Be sure that the cooling fan you choose is compatible with
your motherboard.
720
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
11-13. Summary
In this chapter, we present the recent releases of Intel and AMD x86
microprocessors. We also recapitulate all the Intel microprocessors,
which are introduced since 1974. As shown in the following table, The
Pentium 4 processors are based on the Intel netburs microarchitecture
and still maintains the tradition of compatibility with IA-32 software. The
Intel Core processors (e.g., Core2 and Core i7) followed the Pentium4
and became the vedette of recent PC’s and other computing platforms.
721
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Bus Speed
Clock Tech Power L2 L3
Processor Socket Cores MHz
GHz (nm) (W) Cache Cache
(MT/s)
Socket 2,
Pentium 0.06- 0.2 800-350 NA 1 50 - 66 - -
3, 4, 5, 7
Pentium
0.12- 0.3 Socket 7 350-250 NA 1 60 - 66 - -
MMX
PBGA437, 0.65- 1, 400,533, 512 KB -
Atom 0.8- 2.13 32 , 45 -
PBGA441 13- 2 667,2. 1MB
Slot 1,
37,47 495, 45, 65, 66, 100,
0.266- 1, 0 KB -
Celeron LGA 775, 90, 130, 5.5- 6 133, 400, -
3.6 2 1MB
Socket M, 180,250 533, 800
Socket T
256KB,
Pentium 350, 29.2-
0.15- 0.2 Socket 8 1 60, 66 512KB, -
Pro 500 47
1024KB
Slot 1,
MMC-1,
0.233- 250, 16.8- 256KB -
Pentium II MMC-2, 1 66,100 -
0.45 350 38.2 512KB
Mini-
Cartridge
130,
Pentium Slot 1, 17- 256KB -
0.45-1.4 180, 1 100,133 -
III Socket370 34.5 512KB
250
Slot 2,
45, 100, 133,
Socket603 1,
65, 400, 533,
Socket604 2,
90, 16- 667, 800, 256KB - 4MB -
Xeon 0.4-4.4 Socket J, 4,
130, 165 1066, 1333, 12MB 16MB
T, B 6,
180, 1600, 4800
LGA1156, 8
250 5860, 6400
LGA 1366
Socket423 400,
Socket478 65 , 90 , 21 - 533, 256KB -
Pentium 4 1.3 - 3.8 1 -
,LGA 775, 130 , 180 115 800, 2MB
Socket T 1066
Pentium 4 3.2 - Socket478 92 - 800, 512KB - 0KB -
90 , 130 1
Ex Edition 3.73 ,Socket T 115 1066 1MB 2MB
Pentium 0.8- 5.5 - 400, 1MB -
Socket479 90 , 130 1 -
M 2.266 27 533 2MB
Pentium 2.66- 95 - 533, 800 2×1 MiB
Socket T 65 , 90 2 -
D/EE 3.73 130 1066 2×2 MiB
Pentium Socket775 10 - 533,667, 1MB -
1.6 -2.93 45 , 65 2 -
Dual-Core t M, P, T 65 800,1066 2MB
722
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
Bus Speed
Clock Tech Power L2 L3
Processor Socket Cores MHz
GHz (nm) (W) Cache Cache
(MT/s)
Socket775 800,1066,
Pentium 32 , 45 , 5.5 - 1, 2x256B - 0KB -
1.2 -3.33 LGA1156, 2.5GT/s,
New 65 73 2 2MB 3MiB
LGA1155, 5 GT/s
1.06- 5.5 - 1, 533 ,
Core Socket M 65 2MB -
2.33 49 2 667
Socket775 1, 533, 667,
1.06- 5.5 - 1MB -
Core 2 Socket M, 45 , 65 2, 800,1066, -
3.33 150 12MB
P, J, T 4 1333,1600
LGA1156 35 - 1066, 1600, 3MB -
Core i3 2.4 - 3.4 22 , 32 2 256KB
LGA 1155 73 2.5-5GT/s 4MB
1.06 - LGA1156 22 , 32 , 17 W - 2, 4MB -
Core i5 2.5-5GT/s 256KB
3.46 LGA 1155 45 95 W 4 8MB
LGA1156, 22 , 32 , 45- 4.8GT/s, 4×256K 6 MB -
Core i7 1.6 - 3.6 4
1366,2011 45 130 6.4GT/s B 10MB
LGA1366, 6x256K 12MB -
Core i7 3.2- 4 32, 22,14 130 6 6.4GT/s
LGA 2011 B 15MB
In the older architectures, the front-side bus (FSB) was the interface for
exchanging data between the CPU and the chipset north bridge. If the
CPU had to read or write into system memory or over the PCIe bus, then
the data had to traverse over the external FSB. In the new Nehalem
microarchitecture, Intel moved the memory controller and PCIe controller
from the north bridge onto the CPU die. These changes help increase
data-throughput and reduce the latency for memory and data transactions.
723
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
11-14. Problems
11-4) Compare between the Pentium 4 and AMD K7 processors, from the
point of view of speed, number of pins, sockets, operating voltage and
power dissipation
11-5) Compare between the Intel Core2 Duo and AMD Sempron
processors, from the point of view of speed, number of pins, sockets,
operating voltage and power dissipation
724
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
11-15. Bibliography
725
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11
726
Prof. Dr. Muhammad El-SABA