0% found this document useful (0 votes)
95 views

Godse Microprocessor Microcontroller 3ed

Tecnico
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
95 views

Godse Microprocessor Microcontroller 3ed

Tecnico
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 195
WRC Rel elmer ait) echnical Publications Pune” Ss et Microprocessors & Microcontrollers ISBN 978 - 81 -8431-297-3 Al rights reserved with Technical Publications. No part of this book should be reproduced in any form, Electronic, Mechanical, Photocopy or any information storage and retrieval system without prior permission in writing, from Technical Publications, Pune. Published by : ‘Technical Publications Pune® #1, Amit Residency, 412, Shanivar eth, Pune - 411 030, Ina Printers : Vikram Printers 34, Parvati Industrial Estate Pune-Setara Road, Pune - 411009. Thanks to professors, students and authors of various technical books for their overwhelming response to our books. Looking ot the feedoack and the response we received from previous books, we are very pleased to release a text book on Microprocessors & Microcontrollers. The purpose of this book is to fulfil a need for text stating in plain, lucid. and simple everyday language. This book provides a logical method for explaining and it prepares a background of the topic with essential illustrations. This text is provided with number of solved design examples which helps students fo understand the applications of microprocessors and microcontrollers based systems. The ropid spread of microprocessor/microcontroller in sociely has both simplified and complicated our lives. The of this text is to introduce concepts related to microprocessor and with the background of microprocessor discuss details of microcontroller family. The text bosically covers details of Pentium Microprocessor & 8051 Microcontroller, its architecture, instruction set and programming, and interfacing of it with keyboard, display and other devices. It also discusses operating modes of Pentium processor, its I/O organisction and memory organisation. The text also introduces PIC microcontrollers. ‘Acknowledg We wish to express our profound thanks to all those who helped in making this book o reality. Much needed moral support and encouragement is provided on numerous occasions by our whole family We are specially grateful to the great teacher Prof. A.V. Bakshi for his time to time, much needed, valuable guidance. Without the full support and cheerful encouragement of Mr. Uday Bakshi the book would not have been completed in time. Finally, we wish to thank Mr. Avinash Wani, Mr. Ravindra Wani and the entire team of Technical Publications who have token immense pain to get the quality printing in time. ‘Any suggestions for the improvement of the book will be acknowledged and appreciated. Atul Godse Deepali Godse Dedteated to My “Parents 1.3 Pentium Architecture and Functional Description .. 1.4 Pin Description 1.5.4 Real Mode Programming Model . . 1.5.2 Memory Addressing in Real Mode. 1.5.3 Handling Interrupts and Exceptions in Real Mode 1.7 Pentium Super-scalar Architecture .. 1.8 Pipelining .... 1.9 Instruction Pairing Rules... 1.11.1 Cache Memory 1.11.2 Two Level Cache System . 11.1.3 Pentium Cache Organisation. 1.12 Floating Point Uni 2.2 RESET Operation... 2.3 Bus Operations and Bus Cycles 2.4 Bus Cycle States... 2.5 Non-Pipelined Bus Cycles .. 2.5.1 Non-pipalined Read Cycle... 2.5.2 Non-pipelined Write Cycle............. ee 2.6 Pipelined Read/Write Cycle 2.7 Burst Cycle... 2.8 Memory Organisation... 2.9 VO Organisation... 2.9.1 V0 Mapped /O...... 2.9.2 Memory Mapped V/O . 2.10 Data Transfer Mechanism - 8-bit, 16-bit, 32-bit and 64-bit........ view tions ...... . ec - 3.2 Programmer's Model. 3.2.1 General Purpose Registers 3.2.2 Segment Registers 3.2.3 Index, Pointers, and Base Registers . 3.2.4 EFLAGs Register. . 3.2.4.1 Status Flags. 3.2.4.2 Control Flags . 3.2.4.3 System Flags . 3.2.5 More about EFLAGs . 3.2.6 System Address Registers .............. 3.2.7 System Registers 2.2.7.4 Control Registers 3.2.7.2 Debugs Registers. . 3.2.7.3 Test Registers . . 3.3 Pentium Addressing Modes 3.4 Pentium Data Types . 3.5 Instruction Set Summai 3.5.1 Data Transfer Instructions .. 3.5.2 Binary Arithmetic Instructions . | 3.5.3 Decimal Arithmetic Instructions. 3.5.4 Logical Instructions ........+-+e+s:eseeeseessseeeesessoeresereeeresstess 3.5.6 Bit and Byte Instructions 3.5.8 String Instructions Instruct 3:6.10 ENTER and companion LEAVE instructions, 3.6.11 Flag Control (EFLAG) Instructions .......ssssseeestevsseees 3.6.12 Segment Register Instructions 4.2 Protected Mode-Support Registers... 4.3 Logical to Physical Address Translation ..... 4.4 Segmentation .. 4.5 Segment Descriptors and Memory Management through Segmentation. 45.1 Types of Segment Descriptors ... 4.5.1.1 Non-sysiem Segment Descriptor 4.5.1.2 System Segment Descriptors . 4.5.2 Descriptor Tables 45.3 More about Segment Registers 4.6 Paging... 4.6.1 Support Registers and Tables................. 46.2 PDE Descriptor . . 46.3 PTE Descriptor 4.7 Translation Lookaside Buffer or Page Translation Cache 4.8 Paging Operation... 4.9 Protection 4.9.1 Protection By Segmentation .......... 49.2 Privilege Level Protection ....... 4.92.1 Restricting Access to Data. . ; 4.9.2.2 Accessing Data in Code Segments... . 4.9.2.3 Restricting Control Transfers. 4.9.3 Inter-privilege Level Transfer of Control 4.93.1 Conforming Code Segment ._. 4.93.2 CallGates 49.4 Changing Stacks . . 4.9.5 Page Level Protection 4.95.1 Restricting Addressable Domain... 2... 4.95.2 Type Checking. . 4.10 Privileged Instructions... 4.10.1 Privileged Instructions 4.11 Special Protection Mode Instructions... 4.12 Demand Paging ... 4.13 Moving to Protected Mode .. 4.14 Switching Back to Real Address Mode 4.15 Virtual Memory..... 5.2 Scheduling Methods for Multi-user Operating System... 5.2.1 Time-Slice Scheduling, 5.2.2 Pre-emplive - Priority Based Scheduling. 5.2.3 Context Switching 5.3 Support Registers and Related Descriptors for Multtasking 5.3.1 Task State Segment (TSS). G2 iT Sh LD SMT DMM eterna te tet tate ate tite teeta te lett testes 5.3.3 Task Register (TR) 5.3.4 Task Gates and Task Gate Descriptor 5.4 Task Switchin: 5.4.1 Task Switching Without Task Gate 5.4.2 Task Switching with Task Gate . 5.5.1 /O Privilege Level. . 5.5.20 Permission Bit Map 6.2 Entering and Leaving 8086 Virtual Mode... 6.2.1 Entering 8086 Virtual Mode. . 6.2.2 Leaving 8086 Virtual Mode . . 6.3 Registers and Instructions ..... oe 6.3.1 Registers... ss. eee eevee eee eee eee eee tenet etter eee 6-3 8.4 Introduction g-4 8.2 Features Of 8051 nee cc aecce renee = 2 8.3 MCS-51 (8051) Family Architecture... 8.5.1 Pin-out of 8051 8.3.2 Central Processing Unit (CPU) 8.3.3 On-chip Data Memory and Register Bank. 8.3.4 On-chip Program Memory 8.3.5 Input/Output Ports. 8.3.6 Register Set... . 8.3.6.1 Register A (Accumulator) 8.3.6.2 Register B . . a 8.3.6.3 Program Status Word (Flag Register). 8.3.64 Stack end Stack Pointer... 8.3.6.5 Data Pointer (OPTR) 8.3.6.6 Program Counter 8.3.6.7 Special Function Registers see oe we 8.3.7 The 8051 Oscillator and Clock 0 BI 8.4 Memory Organization in 8051.. 8.5 Input/Output Pins, Ports and Circuits. 8.6 External Data Memory and Program Memory .. 8.6.1 External Program Memory. 8.6.2 External Data Memory. . 8.6.3 Important Points to Remember in Accassing External Memory . 8.7 Timers/Counters and their Programming 8.7.2 Timer/Counter Control Logic fee 8.7.3 Timer Oand Timer? Bh 8.7.4 Programming. 8.8 Serial Port and their Programming... 8.8.1 Operating Modes for Serial Port 8.8.2 Serial Port Control Register. 8.8.3 Generating Baud Rates... . 8.8.4 Programming 8051 for Serial Data Transfer 8.8.5 Programming 8051 for Receiving Serial Data 8.9 Interrupt Structure ... 8.9.1 Priority Level Structure 8.9.2 Extemal interrupts... 8.9.3 Single-Step Operation 8.10 Other Features 8.10.1 Power Saving Options...........................sssseees 8.10.2 dle Mode eee BAD ‘8.10.3 Power Down Mode 8.10.4 Multiprocessor Communication in 8051 Review Questions... ‘Minimum Sys 9.4 Introduction .. 9.2 Minimum System 9.2.1 Supporting Circuits 92.441 Clock Circuits 9.2.1.2 DemuttiplexingP,, - Pas - 92.13 Reset Circuit 9.2.2 Memory interfacing ......... 2.2. e eee eee 9.2.3 Interfacing Example... 22. eee sees e eset esses eeeeeeeeeeeeeeeeeeeeeee 9.3 8051 I/O Expansion using 8256.... 9.4 Interfacing Keyboard. 9.4.1 Key Debounce using Hardware. 9.4.2 Key Debouncing using Software . 9.4.3 Simple Keyboard Interface 9.44 Matrix Keyboard Interface... 9.5 Interfacing Display 9.5.1 LED Displays... 95.2 Interfacing LED Displays. . 9.6 Interfacing LCD Display 9.7 Interfacing DAC to 8051. 9.7.1.1 Important Electrical Characteristics for IC 1408 oa ae 9-38 9.7.2 Interfacing DAC 1408 / DAC 0808 with 8051 9.8 Interfacing ADC to 8051...... 9.8.1 ADC 0804 Family .. 9.8.2 ADC 0808/0809 Farrily 9.8.3 Interfacirig of ADC 0803/0804/0805 with 8051 . 9.9 Stepper Motor Interfacing ..... 9.10 Typical MCS-51 Based System. 9.11 Interfacing Examples 10.3 PICs with Key Features .... 90.4 PICTCCKX essssssasesssssnnssrnnesssssssnsssssssnnsssnsssnssssnssssssessrsssserssseee tO = 5 10.4.1 Features of 16C6X Microcontroller........ 10-5, 10.4.1.1CoreFeaures. 10-5 10.4.1.2 Peripheral Features 10-6 10.4.2 Block Diagram 10-7 10.4.3 Pin Diagrat 10-7 10.4.4 Memory Organization A 2... 10-9 10.44.1 ProgramMemory we 10-9 10.4.4.2DataMemory. ee ee ee 10-9 10.5 PIC16F8XxX... 2 10-14 10.5.1 Fe i i = 10.5.1.1 Core Features... . 10.5.1.2 Peripheral Features 10.5.2 Block Diagram 10.5.3 Pin Diagram. . 10.5.4 Memory Organization ... . . 10.5.4.1 Program Memory Organization. 105.42 Data Memory Organization... . 10.6 Reset and Clocking in PIC... 10.6.1 Reset 10.6.1.1 Power-on Reset(POR). .. . 10.6.1.2 Brown-out Reset (BOR). 10.6.1.3 Watch Dog Timer (WOT)... ss. 10.6.2 Clocking, . 106.2.1 Clocking Scheme/Instruction Cycle 106.2.2 instruction Flow/Pipelining .. 10.7.4 Registers........... sees eee 10.7.6 Program Memory Paging... Ett teat ttt 10.8 I/O Ports in PICIGC6X and PICTGFS2X ss sssssesstsssssessssessseese lO = 33 10.8.1 PORTA and TRISA Register 10.8.2 PORTB and TRISB Register 10.8.3 PORTC and TRISC Register. 10.8.4 PORTD and TRISD Registers 10.8.5 PORTE and TRISE Registers . 10.8.6 Parallel Slave Port (PSP) . 10.9 Interrupts in PIC16C6X and PIC16F87xX .. 10.9.4 INT Interrupt 10.9.2 TMRO Interrupt ......... 10.9.3 Port8 INTCON Change 10.9.4 Context Saving During Interrupts 10.10.3 Timer2 Mcduie. . 5 10-43 10.11 Capture/Compare/PWM Modules in PIC16C6X and PIC16F87X.10 - 44 pl) Ge LMT ACD ttt atti tie eta eae states tsee 10.11.2 Compare Mode .. . 10.11.3 PWM Mode (PWM) PWM Duty Cycle. eee 10.113.3 Satup for PWM Operation... ee 10.12 Data EEPROM and Flash Program Memory in PIC16F87X. 10.12.1 EECON1 and EECON2 Registers. 10.12.2 Reading the EEPROM Data Memory . . 10.12.3 Writing to the EEPROM Data Memory . 10.12.4 Reading the FLASH Program Memory . 10.12.5 Writing to the FLASH Program Memory 10.13 ADC in PICTGFS7X ssssssssssssssssssssssstssssstsnssstassstssssstasssssasssnsssnasee dO = 5B 10.14 Addressing Modes in PIC16C6X and PIC16F87xX.... ex 10.141 Direct Addressing ..........c-ccstecsseeeresenessseen eee seseneeeoere 10.4.2 Indirect Addressing ......s+sssssssessseesse ser seesesvessersreseeseers 10.15 Instruction Set of PIC16CXX and PIC1GFEXX........csssesesceeseseerne IO = 63 10.15.1 Instruction Descriptions. ...........scssseeeceeesee sees esse eseeeee ones 10-66 Introduction to Pentium Microprocessor 1.1 Historical Evolution of Microprocessors To understand any microprocessor and its family advancements, it is necessary to look back in time and study the evolution of it. In this chapter we are going to study the pentium microprocessor. Before going to study details of Pentium microprocessor we will see historical evolution of pentium microprocessor family with features of 80286, 80386 and 80486. The Fig. 1.1 shows the complete Intel family of processors with different abilities and power. From this diagram it can be noticed that the bigger the number, the more powerful the processor, and adding SX to the end means a cut down version. The first generation in Intel family is 4004. After producing the 4004, Intel announced the 8008, a larger, faster version. In 1974 Intel came out with 8080. The 8080 was a considerable improvement over its predecessors. Then Intel came out with 8085 microprocessor. The 8008, 8080 and 8085 represent a progression of 8-bit processors, with each new device including more circuitry and being more flexible. The next generation was 8086 processor, a 16-bit processor, with advanced architecture and instruction set. At the same time Intel introduced processor 8088. The 8088 is an 8-bit version of the 8086 which has fewer data lines but retains all of the processing features of the 8086. The programs that run on 8088 will also run, without modification on the 8086. The 8086/88 pair were the first members of iAPX 86, family of microprocessors. (See Fig. 1.1 on next page.) In 1983 the next version was announced, the 80186/88 very similar to 8086/8088 pair. The 80186/88 included many useful peripheral I/O functions as an integral part of the microprocessor. The improved instruction set of 80186/88 supports these peripheral 1/O functions. Although the 80186 provided increased functionality, it maintained compatibility with the 8086, ensuring that it could execute 8086 programs. After 80186/88, Intel has announced 80286, which is 16-bit processor like 8086. The 80286 was the first family member designed specifically for use as a CPU in a multi-user microcomputer. It contains many advanced modes of operations not supported by 8086. The 80286 boosted a new mode of operation-protected mode. Due to this the entire (14) Microprocessors and Microcontrollers 1-2 Intro. to Pentium Microprocessor Fig. 1.1 The complete Intel family concept of memory segmentation was changed. The virtual memory management circuitry were included in the 80286, which allow an 80286 to operate in either real address mode or protected virtual address mode. Features of 80286 1 2 ‘The 80286 is a 16-bit processor. The 16-bit ALU allows to process 16-bit data. It has 24-bit address bus. So it can access upto 16 Mbytes (24) of physical memory or 1 Gigabyte (2°) of virtual memory. The 80286 can be operated at three different clock speeds. These are 4° MHz (80286-4), 6 MHz (80286-6), and 8 MHz (80286). The 80286 includes special instructions to support operating systems. The 80286 is housed in a 68-pin leadless flat package. This makes it possible to provide separate pins for address lines and data lines, which speeds up processing and simplifies the hardware. It contains four separate processing units. These are the Bus Unit (BU), the Instruction Unit (IU), the Address Unit (AU) and the Execution Unit (EU). This pipelined architecture greatly improves the performance of 80286. The 80286 microprocessor is compatible with their earlier 8086, 8088, 80186 and 80188 chips. Virtually anything that runs under those microprocessors will also run under the 80286. It has virtual memory-management circuitry and protection circuitry, which allows an 80286 to operate in either real address mode or protected virtual address mode Miéroprocessors and Microcontrollers 4-3 Intro. to Pentium Microprocessor 9. The 80286 was the first family member designed specifically for use as CPU in a multi-user microcomputer. In 1986, the next advanced processor, the 80386DX, was introduced. As expected, 80386DX is faster than any of its predecessors, with a minimum operating frequency of 16-MHz. It is an 32-bit processor with 32-bit register set, address bus and data bus. Chip Introduction | Data bus | Address bus 4004 1971 4 8 2008 1972 3 | 8 8080 1974 a | 16 2085 1977 8 16 8086/88 1978 801867188 1982 80286 1983 803860x 1986 80386Sx 1988 B04860x 1969 (With coprocessor) 80486Sx 1989 (Without coprocessor) Pentium 1993 Table 1.1 80X86 family tree During 1988, an “economy version” of the 80386, called the 80386SX was introduced by Intel. This processor had the same outside connections as the 80286, but inside it was a 386-processor supporting the 386’s expanded instruction set and various operating modes. The Table 1.1 shows the 80X86 family tree. Features of 80386 1. The 80386 is a 32-bit processor. The 32-bit ALU allows to process 32-bit data. 2. It has 32-bit address bus. So it can access upto 4 Gbyte (2* ) physical memory or 64 Tetrabyte (2%) of virtual memory (Explained in the later section). 3. The 80386 runs with speed upto 20 MHz instructions per second. The pipelined architecture of the 80386, allows simultaneous instruction fetching, decoding, execution and memory mariagement. Instruction pipelining, a high bus bandwidth and on-chip address translation significantly shorten the average instruction execution time of 80386. These architectural design features enable the 80386 to execute 3 to 4 million instructions per second. Microprocessors and Microcontrollers 44 Intro. to Pentium Microprocessor. 5. It allows programmers to switch between different operating systems such as PC-DOS and UNIX. It can operate on 7 different data types : a. Bit b. Byte «Word — d. Double word e.Pword f, Quadword —_g. Tenbyte. It has built-in virtual memory management circuitry and protection circuitry required to operate an 80386 in these modes. The 80386 can operate in real mode, protected mode or a variation of protected mode called virtual 8086 mode. In real mode it functions basically as a fast 8086 or real mode 80286. The protection mode operation provides paging, virtual addressing, multilevel protection and multitasking and debugging capabilities. The 80386 microprocessor is compatible with their earlier 8086, 8088, 80136, 80188 and 80286 chips. Virtually anything that runs under these microprocessors will also run under the 80386. Early in 1989, Intel introduced the 80486DX, the more highly integrated microprocessor with built-in coprocessor. Meanwhile, Intel has also developed step-down version 80486SX (without coprocessor and lower clock speed). Features of 80486 FeV It is a highly integrated device containing about 1.2 million transistors. The 80486 operates on 25 MHz, 33 MHz, 50 MHz, 66 MHz or 100 MHz. It has built-in math coprocessor. 80486 is a 32-bit architecture with on-chip memory management and cache memory units. On-chip cache memory allows frequently used data and code to be stored on-chip, thereby reducing accesses to the external bus. MMU consists of segmentation unit and paging unit. Segmentation ailows management of the logical address space by providing easy data and code relocatibility and efficient sharing of global resources. Paging allows operating system designers to make physical memory appear to be anywhere in the 4Gigabytes address space. In protected virtual mode it can manage upto 64 Terabytes of virtual memory. The MMU provides four levels of protection for isolating and protecting applications and the operating system from each other. The 80486 has three modes of operation : Read mode, Protected mode and Virtua! 8086 mode. Microprocessors and Microcontrollers 15 Intro. to Pentium Microprocessor 9. It is available in two versions : 80486 DX and 80486SX. The only difference between these two versions is that the 80486SX does not contain the numeric coprocessor. 10. Most of the 80486 instructions require only one clock instead of two clocks required by the 80386. 11. It supports five-stage instruction pipeline scheme that allows it to execute instructions much faster than 80386. 12. It executes conditional JUMP instructions more efficiently. When the 80486 decodes a conditional jump instruction, it autornatically prefetches one or more instructions from the jump destination address just in case the jump is taken. Therefore, if the branch is taken, the 80486 does not have to wait through a bus cycle for the first instruction at the branch address. 13. It has built-in parity check/generator unit to implement parity detection and generation for memory reads and writes. 14, It supports burst mode memory reads and writes to implement fast cache fills. 15. It executes a few new instructions that control the internal cache memory and allow addition (XADD) and comparison (CMPXCHG) with an exchange and a byte swap (BSWAP) operation. Other than these few additional instructions, the 80486 is 100 percent compatible with the 80386 and 80387. 16. It supports built-in-self-test. It tests microprocessor, coprocessor, and cache at reset time. If the 80486 passes the test, EAX contains a zero. 17. It has additional test registers (TR3 - TRS) to test the cache memory. The Pentium, introduced in 1993, was similar to the 80386 and 80486 microprocessors. It contained larger internal cache and data bus width is extended to 64-bit. The table 1.2 shows the comparison between various pentium processors. Processor Data bus | Memory | Lt cache | L2 cache | Bus width size | Data-Code transfer speed Pentium 60 MHz 1993 6 4cpyte | eK - 8K - — |60- 66 mez 66 MHz 120 MHz 133 MHz 233 MHz Pentium Pro 1995 6 64 GByte | 8K - 8k 256K | 60 - 66 MHz 150-166 MHz Microprocessors and Microcontrollers Pentium 1! 350 MHz 400 MHZ 450 MHz 1-6 Intro. to Pentium Microprocessor 16K - 46K | 512K 100 MHz Pentium I Xeon 64 GByte 100 MHz. 16K - 16 |512 K or 1M Pentium Ill 1 GHz 64 GByte | 16k -16K | 512K 100 MHz Slot 1 version Pentium tll 1 GHz 1998 64 64 Geyte | 1ek- 16K | 256K 100 MHz Flip chip version Pontium Ill 1 GHz, 1998 64 64 Gayte | 16k- 16k | 256K 66 MHz Celeron Pentium IV 1.3 GHz 2000 64 64 Gayte | 16k - 16k | 256K 100 MHz 1.4 GHz 1.5 GHz Table 1.2 Comparison between pentium processors = Pentium IV uses the RAMBUS memory technology in place of SDRAM technology used in other pentium processors. 1.2 Pentium Features microprocessor and provides The pentium processor family architecture contains all of the features of the 80486 ‘ignificant additions and enhancements as given below : = Wider Data Bus Width : The Pentium processors have a wider data bus width. The data bus width has been increased from 32-bit to 64 bit to improve the data transfer rate. Burst read and burst write back cycles are supported by the Pentium processors. In addition to 64-bit bus, bus cycle pipelining has been added to allow two bus cycles to be in progress simultaneously. = Faster Floating Point Unit : The floating-point unit has been completely redesigned over the 80486 CPU. Faster algorithms provide up to ten times speed-up for common operations including add, multiply, and load. = Improved Cache Structure : Pentium processors include separate code and data caches integrated on-chip to meet performance goals. Each cache is 8 Kbytes in size, with a 32-byte line size and is 2-way set associative. Each cache has a dedicated Translation Lookaside Buffer (ILB) to translate linear addresses to physical addresses. The data cache is configurable to be write back or write through on a line-by-line basis and follows the MESI protocol. The data cache tags are triple ported to support two data transfers and an inquire cycle in the same clock. The code cache is an inherently write-protected cache. The code cache tags are also triple ported to support snooping and split line accesses. Individual pages Microprocessors and Microcontrollers 1-7 Intro. to Pentium Microprocessor can be configured as cacheable or non-cacheable by software or hardware. The caches can be enabled or disabled by software or hardware. «Dual Integer Processor : Pentium processor has a dual integer processor. It allows execution of two instructions per clock. «= Branch Prediction Logic : The Pentium uses technique called branch prediction to check whether a branch will be valid or invalid. To implement branch prediction Pentium processor has two prefetch buffers, one to prefetch code in a linear fashion, and one that prefetches code according to the Branch Target Buffer (BTB). Therefore, the needed code is almost always prefetched before it is required for execution. = Data Integrity and Error Detection : The Pentium processors have added significant data integrity and error detection capability. Data parity checking is still supported on a byte-by-byte basis. Address parity checking, and internal parity checking features have been added along with a new exception, the machine check exception. = Functional Redundancy Checking : The Pentium processors have implemented functional redundancy checking to provide maximum error detection of the processor and the interface to the processor. When functional redundancy checking, is used, a second processor, the "checker" is used to execute in lock step with the "master processor. The checker samples the master's outputs and compares those values with the values it computes internally, and asserts an error signal if a mismatch occurs. = Enhancement Virtual 8086 Mode : Enhancements to the virtual 8086 mode have been made to increase performance by reducing the number of times it is necessary to trap to a virtual 8086 monitor. = Superscalar Processor : Processors capable of parallel instruction execution of multiple instructions are known as superscalar processors. The Pentium is capable, under special circumstances, of executing two integer or two floating point instructions simultaneously and thus it supports superscaler architecture. The Pentium Pro is a still faster version of the Pentium, and it contains a modified internal architecture that can schedule up to five instructions for execution, and an even faster floating point unit. It also contains a 256 K-byte or 512 K-byte level two cache in addition to the 16 K-byte (8 K for data and 8 K for instruction) level one cache. The Pentium Pro includes error correction circuitry (ECC) to correct a one bit error and indicate a two bit error. It provides four additional address lines which makes it possible to access 64 Gbytes of directly addressably memory space. -Microprocessors and Microcontrollers 1-8 Intro. to tium Microprocessor 1.3 Pentium Architecture and Functional Description Fig. 1.2 shows internal architecture of Pentium processor. As shown in the Fig, 1.2, it is. a complex processor with many interlocking parts. At the heart of the processor there are . two pipelines, the U pipeline and the V pipeline, The U-pipeline can execute all integers and floating point instructions. The’V pipeline can execute simple integer instructions and the FXCH floating-point instructions. Further more, during execution, the U and V pipelines are capable of executing two integer instructions at the same time, under special conditions. 32-bt address bus Instruction decode Fig. 4.2 Pentium architecture block diagram Microprocessors and Microcontrollers 1-9 Intro. to Pentium Microprocessor Bus Unit The Pentium communicates with the outside world via a 32-bit address bus and a 64-bit data bus. The bus unit is capable of performing burst reads and writes of 32 bytes to memory, and through bus cycle pipelining it allows two bus cycles to be in progress simultaneously. It consists of following functional entities : Address Drivers and Receivers : During bus cycles the address drivers push the address onto the processor's local address bus (Ay, : A; and BE, : BE,). The address bus transfers addresses back to the Pentium address receivers during cache snoop cycles. Only address lines Ax, : As are input during cache snoop cycles. Write Buffers : The Pentium processor provides two write buffers, one for each of the two intemal execution pipelines. This architecture improves performance when back-to-back writes occur. Data Bus Transceivers : The transceivers send data onto the Pentium processors's local data bus during write bus cycles, and receive data into the processor during read bus cycles. Bus Control Logic : The Bus Control Logic controls whether a standard or burst bus cycle is to be run. Standard bus cycles are run to access 1/O locations and non-cacheable memory locations, as well as cacheable memory write operations. During these bus cycles the transfer size will be either 8, 16 or 32 bits as specified by the instruction. Burst cycles are run by the Pentium processor during cache line fills and during cache write-back bus cycles from the data cache, Four quad-words are transferred during each burst bus cycle. mie Bus Master Control Level 2 Cache Control Internal Cache Control Parity Generation And Control Fig, 1.3 The elements comprising the Pentium processor bus unit Bus Master Control : Bus Master control signals allow the processor to request the use of the buses from the arbiter and to be preempted by other bus masters in the system. Microprocessors and Microcontrollers 1-10 Intro. to Pentium Microprocessor = Level Two (L2) Cache Control : The Pentium processor includes the ability to control a L2 (secondary) external cache operation. = Internal Cache Control : Internal Cache Control logic monitors input signals to determine when to snoop the address bus and output signals to notify external logic, the results of a snoop operation. It also ensures proper cache coherency. = Parity Generation and Control : It generates even data parity for each of the eight data paths during write bus cycles and checks parity on read bus cycles. It also generates a parity bit for the address during write bus cycles and checks address parity during external cache snoop operations. Code Cache An 8 KB instruction cache is used to provide quick access to frequently used instructions. It holds copies of the most frequently used instructions, and it is dedicated to supplying instructions to each of the processor's execution pipelines. The cache is organized as a two-way set associative cache with a line size of 32 bytes. The cache directory is triple ported to allow two simultaneous accesses from the prefetcher and to support snooping. When an instruction is not found in the code (instruction ) cache, it is read from the external memory and a copy is placed into the code cache for future references. Prefetcher Prefetcher requests for Instructions from the code cache. If the requested instruction is not in the cache, a burst bus cycle is run to external memory to perform a cache line fill. Prefetch Buffers Pentium provides four prefetch buffers. They work as two independent pairs. When instructions are prefetched from the cache, they are placed into one set of prefetch buffers, while the other pair remains idle. When a branch operation is predicted in the Branch Target Buffer (BTB), it requests the predicted branch's target addresses from cache, which are placed in the second pair of buffers that was previously idle. To do this processor gets the new instruction from branch address in no time. Instruction Decode Unit Pentium provides two stage decoding. The instructions are decoded in two stages known as Decode 1 (D1) and Decode 2 (D2). During D1, the opcode is decoded in both pipelines to determine whether the two instructions can be paired according to the Pentium processor's pairing rules. If pairing is possible, the two instructions are sent simultaneously to the stage 2 decode. During D2 the address of memory resident operands are calculated. Microprocessors and Microcontrollers 4-14 Intro. to Pentium Microprocessor Control Unit It is also. referred to as the Microcode Unit. This control unit consists of the following sub-units : = Microcode Sequencer = Microcode Control ROM This unit interprets the instruction word and microcode entry points fed to it by the Instruction Decode Unit. It handles exceptions, breakpoints and interrupts. In addition, it controls the integer pipelines and floating-point sequences. Arithmetic/Logi¢ Units (ALUs) Pentium provides two ALUs to perform the arithmetic and logical operations specified by the instructions in their respective pipeline. The ALU for the "U" pipeline can complete and operation prior to the ALU in the "V" pipeline, but the opposite is not true. Address Generators Pentium provides two Address Generators (one for each pipeline). They generates the address specified by the instructions in their respective pipeline. Data Cache A separate internal Data Cache holds copies of the most frequently used data requested by the two integer pipelines and the Floating Point Unit. The internal data cache is an 8KB write-back cache, organized as two-way set associative with 32-byte lines. The Data Cache directory is triple ported to allow simultaneous access from each of the pipelines and to support snooping. Paging Unit It is enabled by setting the PG bit in CRy. It translates the linear address (from the address generator) to a physical address. It can handle two linear addresses at the same time to support both pipelines. Floating-Point Unit The floating point unit performs floating point operations. It can accept up to two floating, point operations per clock when one of the instruction is an exchange instruction. 1.4 Pin Description The Fig. 1.4 shows the pin diagram of pentium processor and the Fig. 1.5 shows the pin diagram of pentium processor with functional grouping. Microprocessors and Microcontrollers 1-12 Intro. to Pentium Microprocessor v2 s 4 5 6 7 8 ew es ws fav WIBEWEE Vex Veo Vex Vex Veo BPs Om Ver Veo Veo Veo Veo Veo Veq Veq OP, Dy» Oe sjooo0oo0oo00o00000 09090000 0 a VB BF Oy Yes Yas Ves Ves Day Oe Ves Ves Yes Yes Yes Yes Yes Yes Du Dy Ow 300000000 ooo000000000 o Yigg TERRPM)BP; Oy DP; Dig Dzo Das Dag Dy Dip Dy Dip Dz Diy Dyy Oxy Dig Oy Dy Dez lS oodod SS SSC SGSSSSSESSSS | VecPMYBFs Dy Ds; Dys Dig Day DPs Dy Dry Dig Oxy Dry Dey Dap Oy Oxy OFy Dy Due Ore joo 00080800988 OO OOOO OOO OO {|p Nec Ves OO Di Dy OF Day OP, eF00000 ooo0o0o |r cc Vgs Os Oy Dey Dye Osx Yoo FOO0 0 0000 |r Mec Yes D5 Op Dp Ox Yeo Yeo 6 ie) 06 66 |o Voc Vag FERR OP, Des Ope Ves Ose "6 60 0 606 |x eg VU RERTOE Das Dee Yes Yeo J ooo 0000 |i gg Veg BFF CuK Dg: Vas Vee K 000 0000 |k gs MOLDEREE BT wet Oxy Yas Voc 0 0 0 0 a oo Ooh gs Wee TT Topview PERFRONE Yes Yoo "Jo0 0 0 0000 |u ce Yop WA OSE nk NM Ves Yee "ld 6 0 0 0006 |x ec Veg 82 7B SHI THS Vee Veo "6 C00 0006 |r vec Vgg HOA BE, Veo NO Vas Vee a5 d'0 0 606 6 fc Yee Yes PORKSCXC RS NC Yes Veo 810 6 0 0 0006 |r co Yes Pa BE JST Wc KORE 105 s|6 600 0000 |s Veg Ygs SIDR TOKUIETRE, 8 Oi Mw Ap Ay Ae Ap Ay QW N & NG BT WaT TO, JO G0000006CCCSSG5560 or Yoo FURR Prov BE, Kah BE, BEs Axe Azz A Au Aw Au Az Ao Ae As As Ags Aus Aer yooootF00d6G6G0G 360 3 fe TE, ereOCORR OF HOLD hoy Vas Ves Yes Ves Ves Vas Yes Yes Yes Ves Yes Ves Ay Mp Ae yoo 0006 SSSSSSSSSCSOSSS RE, PAPERCD Ae Voc Voc Voc Voc Vac Vee Voc Yee Mec Yee Yeo Ven Vg BO wl OO OFGCCSSCSCOSSOSSCSGDOOw Fig. 1.4 Pin diagram of Pentium processor Microprocessors and Microcontrollers. 3 Intro. to Pentium Microprocessor Clock clk Initialization Dual Brent processing trace Probe mode Power management Address bus Breakpoint/ Performance mornin ‘Address mask (Bus frequency) (Data bis) Address parity Tap Data port parity Pentium Processor (internal Parity Error) Functional (System Error) Redundancy checking System management mode Bus cycle definition Programmable interupt control Bus control Interrupts Page cacheability Cache Control Bus arbitration Cache ‘snooping! consistency (Write ordenng) EWBE (Cache flush) FLUSH Fig. 1.5 Microprocessors and Microcontrollers 4-14 Intro. to Pentium Microprocessor Pentium Hardware Signals Common Signals Changed Functionality A20M Address Mask : When asserted, forces pentium to limit addressable memory to 1 MB to emulate the memory space of the 8086. This signal is active only in the real mode. AaiAs + These 29 address lines, together with the byte enable outputs, form the Pentiumm's 32-bit address bus. With this 32-bit address @ memory space of 4 gigabytes can be accessed ADS Address Strobe : When low, indicates the begining of a new bus cycle. AHOLD Address hold : This signal is used to place the Pentium’s address bus into a high impedance state so that an inquire cycle can be run. vo Address parity is driven by the Pentium processor with even parity information. It is generated in the same cock that the address is driven. Even parity must be driven back to the Pentium processor during inquire cycle on this pin in the same clock. The address parity check status pin is assorted if the Pentium processor has detected a parity error on the address bus during inquire oycles. | Advanced Programmable Interrupt Controller (APIC) Enable - This gnal is used 19 enatle or sable the Pentium’ interval APC interup | centro cuit ‘The byte enable pins are used to deiermine which bytes must be ‘written to external memory, or which bytes were requested by the CPU for the current cycle, The byte enables are driven in the same clock as the address lines (Agy-Ag). ‘See the purpose of each byte enable Output Data Bus Enabled BE Do-Dy BEL Ds-Dis BE De -Ds BES Day - Ds BE, Dx - Dae BES Dwo- Da BES Das - Dss BE, Ds Des These inpuls are sampled during reset and they contol the ratio of bus ‘frequency to CPU core frequency. Bh=1 —_dusioore ratio = 2/3 BR busicore ratio = 1/2 Off This input causes the processor to terminate any bus cycle currently in process and tri-state its buses. Execution of the interrupted bus cydle is restarted when BOFF goes high. Microprocessors and Microcontrollers 1-15 Intro. to Pentium Microprocessor BP [3 : 2] PM/BP [1:0] - The breakpoint pins (BPs-Pp) comespond to the debug registers, DR,-DRo. These pins externally indicate a breakpoint match when the debug registers are programmed to test for breakpcint matches. BP, and BP are multiplexed with the performance monitoring (PM, and PM). The PB, and PB bils in the Debug Mode Control Register determine if the pins are confgured as breakpoint or performance monitoring pins. BRDY Burst Ready : In Pentium BRDY signal is used to indicate that the external device is ready to transfer data, BREQ Bus Request : This signal when active indicates that the pentium has generated 2 bus request. BT; ~ BT The branch trace outputs provide bits 2-0 of the branch target linear address and the default operand size on BT. These output become| valid during a branch trace special message cycle BUSCHK The bus check input allows the system to signal an unsuccesstul Completion of a bus cycle. If this pin is sampled active, the Pentium processor will latch the address and control signals in the machine check registers. If, in addtion, the MCE bit in CR, is set, the Pentium processor will vector ‘o the machine check exception ae The output indicates whether the data associated with the current bus cycle is being read from or written to the data cache. This is the clock signal for the Pentium. It decides the operating frequency of the Pentium, For example, to operate the Pentum at 66 MHz, we apply a 66 MHz clock to thio pin, a ke DatalCode : It indicates that the current bus cycle is accessing code {0/6 = 0) o data (D/C = 1). oe Des -Do vo These are the 64 data lines for the processor, Lines D7-D) define the least significant byte of the data bus ; lines OgyOsg define the most significant byte of the data bus. DP; - DPp vo These are the data parity pins for the processor. There is one for each byte of the data bus. They are driven. by the Pentium processor with even parity information on writes in the same clock as write data, Even Parity information must be driven back to the Pentium processor on these pins in the same clock as the data fo ensure that the correct parity check status is indicated by the Pentium processer DP, applies to Dysr5gOPy apes fo Dy-Do._ i ina bc se al porey ae ete sees Microprocessors and Microcontrollers 1416 Intro. to Pentium Microprocessor EADS ° External Address Strobe : It is used to indicate that an extemal address may be read by the address bus during an inquire cycle. EWEE 1 ‘empty input. when inactive (high), indicates ‘cycle is pending in the external system. FERR ° Floating Point Error : This output goes iow when floating point unit of] pentium processor generates an error. FLUSH ' When asserted, the cache flush input forces the Pentium processor to write back all modified lines in the data cache end code cache. FREMC 1 ‘The functional redundancy checking masterichecker mode input Is used to determine whether the Pentium processor is configured in master mode or checker mode. When configured as @ master, the Pentium processor drives its output pins as required by the’ bus pprotocol. When configured_as @ checker, the Pentium processor {ristates all outputs (except ERR) and samples the output pins. ‘The configuration as a master/checker is set after RESET and may not| be changed other than by a subsequent RESET. ait ° The hit indication is driven to reflect the outcome of an inquire cycle. If an inquire cycie hits a valid line in either the Pentium processor data or instruction cache, this pin is asserted two clocks after EADS is sampled asserted. If the inquire cycle misses the Pentium processor cache, this pin is negated two clocks after EADS. ° The hit to a modified line output is driven to reflect the outcome of fan inquire cycle. It is assorted aftor inquiro cycles which rosulted in a hit to a modified line in the data cache. It-is used to inhibit another bus master from accessing the data until the line is completely written back. ° Hold Acknowledge : This ouput goes high in response to HOLD| ‘request to indicate that the pentium has been placed in the hold state. HOLD 1 When high, the pentium tri-states its bus signals and activates HLOA. 1BT ° Instruction branch taken indicates that the Pentium has taken an instruction branch. TERR ° ‘The internal error pin is used 10 indicate two types of errors, internal arity errors and functonal redundancy errors. if a party error occurs on a read from an internal array, the Pentium processor will assert the TERR pin for one clock and then shutdown. If the Pentium processor is configured as a checker and a mismatch occurs between the value sampled on the pins end the corresponding value computed internally, the Pentium processor will assert IERR two clocks after the mismatched value Is retumed. IGNNE 1 Ignore Numeric Exception : A iow on this input allows the processor to continue executing floating-point instructions, even if an error is generated. INIT 1 ‘The Pentium processor initialization input pin forces the Pentium processor to begin execution in a known stale. The processor state after INIT is the same as the state after RESET except that the internal caches, write buffers,and floating point registers retain the values they had prior to INIT. if INIT is sampled high when RESET transitons ‘rom high to low, the Pentium processor wil perform builtin self test prior to the start of program execution. Microprocessors and Microcontrollers 4-17 Intro. to Pet im Microprocessor INV ‘The invalidation inpu! determines the final cache line state (shared or invalidated) in case of an inquire cycle hit uw This output goos high for one clock cycle wach time an inetruction completes in U ppeline. Vv This output goes high for one clock cycle each time an instruction completes in V pipeline. KEN Cache Enable : This signal is used to determine whether current cycle is cacheable or rot tock Bus Lock : This signal goes low to indicate that the current bus cycle is locked and may not be interrupted by any other bus master jo (Memory/input-Output) : This signal indicates the type of current bus cycle MiO=0 ~ VO cyclo MiO= 1 — memory cycle NA ‘An active next address input indicates that the external memory system is ready to accept @ new bus cycle although all data transiers for the current cycle have not yet competed, This is a non-maskable interrupt signal of pentium | Private Bus Request Private Bus Grant : This signa is used in a dual processing sysiem to indicate when private bus arbitration % allowed. This signal is used to neo a orivate bus ‘operation in a dual-processing system. The page cache disable pin reflects the state of the PCD bit in CRs, the Page Directory Entry, or the Pago Table Entry. Tho purpose of PCD is to provide an external cacheabilly indication on a page by page basis, Data Parity Check : This output goes low, if the Pentium detects parity error on the deta bus. But in Peniium parity checking has been extended ; if PEN is also asserted fow during the same cycle, the Pentium will savo a copy ‘of the address and control signals in an intemal machine check reaister. Additionally, if the MCE bit in the new CRA register is sot, a machine check exception is generated Parity Enable : If this input is low curing the same cycle a parity error| detected, the Pentium will save a copy of address and control signals in an intemal machine check register. Private Hit: Iti used to mainiain the local cache _|_ processor sytem. _ ‘Plivate Modified Hit + t S used in conjunction with PHIT to maintain hee eeu: ig a dual-processor system. _ {Programmable interrupt Controller Clock) « This ee _ Serial data rate in the internal API The or ee eee The probe ready output pin indicates that the provessor has stopped normal execution in response to the R/S pin going active, or probe Mode being entered. This output is used for debugging purpose. Microprocessors and Microcontrollers, 1:18 Intro. to Pentium Microprocessor PWT ° ‘The page write through pin reflects the state of the PWT bit in CR, the page directory entry, or the page table entry. The PWT pin is used to provide an external write back indication on a page-by-page basis. R/S ‘The runistop input is an asynchronous, edge-sensitive interrupt used to stop the normal execution of the processor and piace it into an idle state. RESET This. signal forces pentium to initialize its registers. to known slate, invalidate code and data cache, and fetch its first instruction from address FFFFFFFOH. This signal must be active for at least 1_ms after power on, scyc The split cycle output is asserted during misaligned LOCKed transfers to indicate that more than two cycles will be locked together. This signal is defined for locked cycles only. The system management interrupt causes a system management interrupt request to be latched intemally. When the latched SM is recognized on an instruction boundary, the processor enters System Menagement Modo. ‘An active system management interrupt active output indicates that the processor is operating in System Management Mode Stop Clock [internal clock, ‘When low, this signal causes ine penile The testability clock input provides the clocking function for the Pentium processor boundary scan in accordance wth the IEEE Boundary Scan interface (Standard 1149. 1). ‘The test data Input is a serial input for the test logic. TAP instuctions and data are shifted into the Pentium processor on the TD pin on the rising edge of TCK when the TAP controller is in an appropriate state. Test Data Output : This signal is used to send serial test information on the falling edge of TCK. ‘The value of the test mode select input signal sampled at the rising edge of TCK controls the sequence of TAP controller state changes. When asserted, the test reset input allows the TAP controller to be asynchronously intialized. Write/Read : This signal indicates whether the current bus cycle is read cycle or write cycle, WIR=0 — Read cycle WIR= 1 — Write cycle The write back/write through input allows a data cache line to be defined as writa back or write through on a line-by-line basis, Note : Table 1.5 ” Non-shaded signals are of Pentium processor (510\60, 567\66) and shaded signals are the additional signal provided in Pentium processor (610\75, 735\90, 815\100, 1000\120, 1110\133). Microprocessors and Microcontrollers 1:19 Intro. to Pentium Microprocessor Pin Grouping According to Function Table 1.6 organizes the pins with respect to their function. Function Pins Clock LK Initialization RESET, INIT Address Bus Agy-Ay. BEy-BEy Address Mask A20M Data Bus D532p Address Parity AP, APCHK. Data Party Dr OP, POH PEN Internal Parity Error TERR System Error BUSCHK Bus Cycle Definition MO, DIG WIR CACHE, SCYC, LOCK Bus Control ‘ADS, BRDY, NA Page Cacheabiity PCD, PAT. Cache Control KEN, We/WT Cache Snooping/Consistency ‘AHOLD, EADS, HIT,HITM, INV Cache Flush FLUSH Write Ordering EWBE Bus Arbitration BOFF, BREQ, HOLD, HLDA Interrupts —* INTR, NMI Floating Point Error Reporting FERR, IGNNE System Management Mode ‘SMI, SNIACT Functional Redundancy Checking FRCMC (IERR) TAP Port ‘TCK, TMS, TD}, TDg, TRST BreakpointPerformance Monitoring PM/BPo, PMBP,, BP» BP, Power Management STPCLK RIS, PROY BT,-BTy, IBT CPUTYP, D/P, DPEN, PBGNT, PEREQ, PHIT, PHITM Probe Mode Branch Trace Dual Processing Programmable Interrupt Control PICCLK, PICDp, PICD,, APICEN BF) - BF; Bus Frequency Table 1.6 Pin functional grouping Microprocessors and Microcontrollers 4-20 Intro. to Pentium Microprocessor 1.5 Pentium Real Mode The Pentium microprocessor can operate basically in either Real Mode, or Protected Mode. When Pentium is reset or powered up it is initialized in Real Mode. The Pentium maintains the compatibility of the object code with 8086, 80286, 80386, and 80486 running in real mode. In this mode, the Pentium supports same architecture as the 8086, but it can access the 32-bit register set of Pentium. In real mode, it is also possible to use addressing, modes with the 32-bit override instruction prefixes. In this section, we will see operation of Pentium in real mode. 1.5.1 Real Mode Programming Model The programming model makes it easier to understand the microprocessor in a programming environment, The real mode programming model gives the programming environment for Pentium in real mode. It shows only those parts of the microprocessor which the programmer can use such as various registers within the microprocessor. Fig. 1.6 shows the real mode Programming Model for Pentium microprocessor. In the diagram, only. the shaded portion is a part of real mode. It consists of eight 16-bit registers (IP, CS, DS, SS, ES, FS, GS and Flag register) and eight 32-bit registers (BAX, EBX, ECX, EDX, ESP, EBP, FSI, EDI). In real mode, Pentium can access CRO, which is used to enter into the protected mode. The Protection Enable bit (PE) is used to switch the Pentium from real to protected mode. From this description it can be seen that Pentium in real mode is a 8086 with extended registers and two additional data segment registers such as FS and GS. It also implements separate memory and I/O address space. Memory space is 1,048,576 bytes (1M byte) and the I/O address space is 65,536 bytes (64 Kbytes), which is similar to 8086 memory and 1/0 address space. 1.5.2 Memory Addressing in Real Mode As mentioned earlier, in Real Mode, memory size is limited to 1 Mbyte. Due to this, only AyAy, address lines are active. The higher address lines AygAs, are normally high. But in case of intersegment jump or call, during CS-relative memory, these address lines (AzrAg)) are low. Eventhough IMbyte memory address space is available in real mode, all this memory cannot be active at one time. Actually, the 1M bytes of memory is partitioned into 64K (65536) byte segments. A segment represents an independently addressable unit of memory consisting of 64K consecutive byte-wide storage locations. Each segment has its own starting address i.e the lowest-addressed byte storage location. The segment registers hold the starting addresses of the active segments in the entire memory. In Pentium, only six out of 16 (IMbyte / 64Kbyte) 64 Kbyte segments can be active at a time. (Code Segment, Stack Segment, Data Segment, ES, FS and GS). Fig 1.7 shows the active memory segments. Microprocessors and Microcontrollers 41-21 Intro. to Pentium Microprocessor 0000046 EXTERNAL MEMORY CODE SEGMENT (CS) AK BYTES DATA SEGMENT (05) ‘os navies Cc. sricx seouene (69) oak inputioureut_ | 64K ADDRESS SPACE EXTRA SEGMENT (ES) FFFF DATA SEGMENT (FS) G4 KBYTES DaTASEOMENT (68) ‘oak eyTES. FFFEFi6 Fig. 1.6 Real mode programming model for Pentium processor Microprocessors and Microcontrollers cs ss ps ES FS cs ‘Segment Registers 1-22 Intro. to Pentium Microprocessor FFFFFH Code Segment Stack Segment Data Segment Data Segment Data Sagment Data Segment (000004 Fig. 1.7 Active segments of memory Paging mechanism in Pentium is not active in the real mode. Thus, in real mode the linear addresses are the same as physical addresses. Physical addresses are generated in Real Mode by adding the contents of the appropriate segment register which are shifted left by 4 bits to an effective address. If there is a carry generated after addition of shifted segment register contents and effective address, unlike 8086, resulting 21-bit address is a linear address. This means that in 8086, the carried bit is truncated, whereas in Pentium the carried bit is stored as bit 20 of the linear address. Fig. 18 shows the real address mode —_address formation and the 21-bit address formation when carry is generated. 4 Bit shifted 16 Bit Segment Selector 19 3 0 NIENRESEES REET REEEN CCH * 16 Bit Effective Address 19 15 oO 21 Bit Linear Address 2119 3 0 - fee Tan an ae lan cary it Fig. 1.8 Real address mode addressing Microprocessors and Microcontrollers 1-23 Intro. to Pentium Microprocessor All segments in Real mode FFFFFH are maximum 64K bytes long. These segments may be read, [| written, or executed. The fe | Pentium generates general protection (interupt 13) OsTADS | exception, if effective address is cove cs C= +4 [>] beyond legal range from 0 to FFFFH. STACK SS All segment registers are Ls] eccasibe fo the programmers, O7MAES So programmer can store roarars[ «4 values in the segment registers adjacent, disjointed, or even overlapping. Fig. 1.9 shows all possible ways of defining cok segments in the memory. For example, segments A and B are Fig. 1.9 Contiguous, adjacent, disjointed and contiguous, whereas segments overlapping segments B and C are overlapping. 1.5.3 Handling Interrupts and Exceptions in Real Mode The Pentium supports Real Mode interrupts and exceptions much like the 8086. In Pentium, addresses from 0 through 3FFH (400H memory locations) are dedicated for Interrupt Descriptor Table (IDT) after Reset. This table contains pointers that define the starting point of the interrupt service routines. Each pointer in the table requires four bytes of memory. Thus, it contains upto 256 (4x 256 = 1024 = 400H ) interrupt pointers. Four bytes in each pointer represent two words. The word having higher memory address holds the segment base address, whereas the word having lower memory address holds offset. Fig. 1.10 shows the Interrupt Descriptor Table (IDT). Like 8086, interrupts are recognized by their numbers/types. Each time when interrupt occurs, Pentium multiplies interrupt number/type by four to generate an index into the interrupt descriptor table. In Pentium, the Interrupt Descriptor Table is relocatable. The base address of interrupt descriptor table is present in the IDTR (Interrupt Descriptor Table Register ). The programmer can change this address by loading different address in the IDTR. This is possible using LIDT instruction. The LIDT instruction allows the relocation of base address and it also used to specify the size of the IDT. If an interrupt occurs and the corresponding entry in the interrupt table is beyond the limit stored in IDTR, a general protection fault (exception 8) will occur. Table 1.7 (see on next page) summarises Pentium Real Address Mode exceptions. Microprocessors and Microcontrollers 1-24 Memory Gate for interrupt # n Gate for interrupt # n—1 Cate or interrupt # 4 cpu. [or ume} Gate for Interrupt # 0 Intro. to Pentium Microprocessor 16 0 Segment Base word 1 memory {some address IOTR x x Fig. 1.10 Interrupt descriptor table Interrupt Cause of Exception Description Number 0 DWV, IDIV Divide error 1 All Debug exceptions 3 INT Breakpoint 4 INTO Overfiow 5 BOUND Bounds check 6 ‘Any undefined opcode or LOCK used | Invalid opcode with wrong instruction 7 ESC or WAIT Coprocessor not available 8 INT vector is not within IDTR limit Interrupt table limit too small ott Reserved 12 Memory operand crosses offset 0 or | Stack fault OFFFFH 13 Memory operand roses offset OFFFFH | Pseudo-protection exception or attempt to execute past offset OFFFFH or instruction longer than 15 bytes 14,15 Reserved 16 ESC or WAIT Coprocessor error 0-255 | INT ‘Two-byte software interrupt Table 1.7 Pentium real-address mode exceptions Microprocessors and Microcontrollers __1-25 Intro. to Pentium Microprocessor Note 1: Some debug exceptions point to the faulting instruction, others to the next instruction. By examining the contents of DR6, it is possible to determine whether the debug is pointing to the faulting instruction or to the next instruction Note 2 : The coprocessor errors are reported on the first ESC or WAIT instruction after the ESC instruction that caused the error. 1.6 Pentium RISC Features Because of the advances in microelectronic manufacturing technology, a number of changes in the computer architectures are taking place from the last decade. It became possible to cram a large logic into the small space of silicon wafer. The new computers were designed which use processors with complex instructions and addressing modes, which we call as Complex Instruction Set Computer (CISC). But the problem arised with the CISC machines was their instructions required multiple clock cycles to execute because of cramming of large logic into a single package. This degraded the performance of CISC machines. This problem is solved by a new design technique called Reduced Instruction Set Computer (RISC). The important factor considered while designing RISC machines is that it uses fewer instructions and simpler addressing modes. Because of the fewer instructions, the number of operations are reduced and can easily be implemented on silicon wafer which results in increase in the speed and hence improves the performance. In this section, we will discuss the features of RISC processor, which of them are applied to design Pentium processor. 1, Reduced accesses to main memory Ideally, computer memory should be fast, large and inexpensive. Unfortunately, it is impossible to meet all the three of these requirements simultaneously. Increased speed and size are achieved at increased cost. Very fast memory of system can be achieved if SRAM chips are used. These chips are expensive and for the cost reason it is impracticable to build a large main memory using SRAM chips. The only alternative is to use DRAM chips for large main memiories. Processor fetches the code and data from the main memory to execute the program. The DRAMs which form the main memory are slower devices. So it is necessary to insert wait states in memory read/write cycles. This reduces the speed of execution. Thus, though the great advances are made in memory technology, processors are much faster than memories. Since the speed of operation of processor is much faster than that of memory, the processor has to wait during each memory access. The RISC design includes a technique which reduces the number of accesses to main memory. Most of the computer programs work with only small sections of code and data at a particular time. In the memory system small section of SRAM is added along with main memory, referred to as cache memory. The program which is to be executed is loaded in the main memory, but the part of program (code) and data that work at a particular time is usually accessed from the cache memory. This is accomplished by loading the active part of code and data from main memory to cache memory. Whenever Microprocessors and Microcontrollers 1-26 Intro. to Pentium Microprocessor the processor tries to read data from main memory, the cache is examined first. The addresses are stored in the caches. If one of these addresses matches the address being used for the memory read, the cache will supply the data, which is called caché hit. Generally, cache is ten times faster than the main memory. When the required data is not found in the cache, it is called cache miss and the processor has to access main memory in this situation. After a cache miss; a copy of the new data is written into the cache, so that the data will be obtained whenever needed, Pentium contains two caches, 8 KB each. An 8 KB instruction cache stores frequently used instructions and an 8 KB data cache stores frequently used data. Initially, each cache is empty and is filled as program executed. 2. Simple instructions and addressing modes (RISC feature, not available in Pentium) When the processor uses simple and fewer instructions and addressing modes, the implementation of operations on silicon wafer is easier. Also it reduces the complexity of the instruction decoder, the addressing unit and the execution unit. In this case, the machine can be operated at a higher clock rate because work which is to be done in each clock period is less. Practically, it is possible to use simple, fewer instructions and addressing mode because after a research, the computer scientists came to know that the programmers use only a small subset of the instructions available on the processor they are using. From this point of view, Pentium is not a RISC processor, but it is a CISC processor. The reason is, Pentium should remain compatible with the installed software of entire 80x 86 family. Each and every instruction and addressing mode of the previous processor, 80486 should be kept as it is. 3. Large sets of registers and make good use of them In the first feature, we have seen how the number of accesses to the memory affects the performance of a processor. Similar to this, the number of registers available in processor can affect the performance of it. When a complex calculation is to be performed by a processor, it may require the use of several data values. If all these data values are stored in a memory, then during the calculations, a number of memory accesses are required to use those data values. But when number of registers are available in processor, instead of storing data values in memory, they can be stored in registers. Accessing the internal registers for reading data values during calculations is much faster thar accessing memory for the same purpose. Thus, it is always good to have large sets of internal registers for the processor. Microprocessors and Microcontrollers 1-27 Intro. to Pentium Microprocessor Pentium has the following sets of registers. i) Seven general purpose registers, all of them are 32-bits wide. ii) Six segment registers, all of them are 16-bits wide. iii) A 32-bit stack pointer. iv) Eight floating-point registers, all of them are 80-bits wide. Thus, the pentium has a large set of registers (like a RISC). 4, Pipelining We know that more than one clock cycles are involved in the instruction cycle. These clock cycles are required to perform various steps in the instruction execution. These steps belong, to various processing stages in the instruction cycle. These are = S, - Fetch (F) : Read instruction from the memory. = S,- Decode (D) : Decode the opcode and fetch source operand (s) if necessary. = S,~ Execute (£) : Perform the operation specified by the instruction. = S,- Store (S) : Store the result in the destination. Usually, instruction is executed by performing above mentioned stages one after the other. When these stages for several instructions are performed simultaneously to reduce overall processing time, the processing is called instruction pipelining. Refer Fig, 1.11. Here, instruction processing is divided into four stages hence it is known as four-stage instruction pipeline. With this subdivision and assuming equal duration for each stage we can reduce the execution time for 4 instructions from 16 time ‘units to 7 time units. Clock cycle tt2]fsi4f[slo]7]e Instruction 4 Fy} Or] Er | St i Fz | Oe | E2 | S Ig Fa | Ds | Es | So My Fa | Dy | Es | Se Fig. 1.11 Four stage Instruction pipelining Microprocessors and Microcontrollers 1-28 Intro. to Pentium Microprocessor In this instruction pipelining four instructions are in progress at any given time. This means that four distinct hardware units are needed, as shown in Fig. 1.12. These units are implemented such that they are capable of performing their tasks simultaneously and without interfering with one another. Information from the stage is passed to the next stage with the help of buffers. interstage butters D F E s Decode instruction Fetch and fetch fe] Fm] Execution L-| Store instruction operands ‘operation result B, 8, By Fig. 1.12 Hardware organisation for four-stage instruction pipeline Coming to the point of performance analysis we can say that pipelining can reduce effective number of clock cycles required for instruction execution and thus increases the rate of executing instructions significantly. It approaches the ideal value of required clock cycles per instructions as shown in Fig. 1.11. However in practice, this ideal value cannot be attained for a variety of reasons. The performance of a processor improves tremendously because of pipelining. There are two types of pipelines in Pentium, instruction pipelines (U and V, covered in further section) and bus cycle pipeline that performs special types of bus cycles. The instruction pipelines include five stages. They operate independently Also, Pentium employs a branch prediction technique (explained in detail in further section). Normally, there is a flow of instructions through U and V pipelines. With a branch prediction technique, Pentium predicts whether to change normal program flow or not. Thus this technique helps to keep a steady stream of instructions flowing into the pipelines. Of course, this increases the rate of instruction execution, and hence the performance of the Pentium improves. This feature of Pentium is very like a RISC machine. 5, Extensive utilization of the compiler When a program is written in higher level language (e.g. C language), during compilation, each statement within a program is converted into assembly language instruction. When we use a Pentium compiler, the advances in the Pentium architecture can be utilized with the optimizations on the assembly language code. Some examples of it are given on next page. Microprocessors and Microcontrollers 4-29 Intro. to Pentium Microprocessor a) Arrange some pairs of instructions such that they will execute in parallel in the floating-point unit or dual-integer pipelines. b) Reorder the instructions such that the Pentium’s branch prediction technique is utilized properly. c) If possible, replace an instruction with an equivalent instruction which requires lesser number of clock cycles or the number of bytes of machine code. For example, MOV EAX, 0 can be replaced by SUB EAX, EAX. d) Use the instruction/data cache or algorithms to allocate the minimum number of processor registers during parcing of an arithmetic statement. Thus, a properly written Pentium compiler helps to achieve a high performance like in a RISC or CISC machine. From all above discussion, it is clear that the Pentium contains both RISC and CISC characteristics. 1.7 Pentium Super-scalar Architecture “Processors capable of parallel instruction execution of multiple instructions are known as superscalar processors. The Pentium is capable, under special circumstances, of executing two integer or two floating point instructions simultaneously and thus it support superscaler architecture. However, there are restrictions placed on a pair of integer instructions attempting parallel execution. These restrictions are discussed in section 1.9. For floating point instructions there is a restriction of which instructions should execute as a first instruction of a pair and which is the second instruction of the pair. First instruction in the pair | Second instruction in the pair FLD FXCH - FLD ST (i), FADD, FSUB FMUL, FDIV, FCOM, FUCOM FTST, FABS, FCHS The modern compilers play the important role in achieving the performance of Pentium processor at superscalar level. They do the ordering of the instructions during code generation to make pair of instructions without any data dependency and make allowable combinations of integer and floating-point instructions for simultaneous execution’ Microprocessors and Microcontrollers 41-30 Intro. to Pentium Microprocessor 1.8 Pipelining In the previous section we have seen that the rate of instruction execution can be improved with a pipelining. In Pentium, there are two instruction pipelines, U pipeline and V pipeline. These are five-stage pipelines and operate independently. These five stages with their order are as follow: 1. PF Prefetch 2. DI Instruction Decode 3. D2 Address Generate 4. EX Execute, Cache and ALU Access 5. WB Writeback yt and gf an sth Instruction Data aust oan PF -—=} pt -—=| 02 4] EX =| we stream Fig. 1.13 Stages in U and V instruction pipelines Both pipelines U and V include the above five stages. The U pipeline can execute any processor instruction, but the V pipeline only executes simple instructions. An instruction, which does not require microcode control to execute and generally takes one clock cycle to complete is referred to as a simple instruction. For example, register-to-register MOVs, INC, DEC, near conditional jumps (e.g. JZ, JNZ etc.). It is to be noted that some simple instructions may take two or three clock cycles. These are arithmetic and logical instructions that use both register and memory operands. Refer Fig. 1.14 which shows the pipelined instruction execution. u Vv U v u Vv U Vv U v pr [iu] ie | 13 «| 6 [Ds [ v | 6 19 | 110 r D1 n | 2 | 13 | 4 [ 15 16 [ 17 18 we n | 2 yas 1 2 3 4 5 Fig. 1.14 Pipelined instruction execution Microprocessors and Microcontrollers 1-31 Intro. to Pentium Microprocessor The following sequence of steps explains the pipelined instruction execution in Pentium. 1. Prefetch (PF) stage : Instructions are prefetched from the instruction cache or memory and fed into the PF stage of both the pipelines U and V. 2. Instruction Decode (D1) stage : In this stage, decoder in each pipeline checks if the current pair of instructions can execute together. If the instruction contains a prefix byte, an additional clock cycle is required in this stage. Also, such an instruction may only execute in the U pipeline and may not be paired with any other instruction. 3. Address Generate (D2) stage : In this stage, the addresses for the operands that reside in memory are calculated. 4, Execute (EX) stage : In this stage, operands are read from the data cache or memory and ALU operations are performed. Also, branch predictions for instructions (except conditional branches in the V pipeline) are verified in this stage. 5. Writeback (WB) stage : This is the final stage. In this stage, the results of the completed instructions are written and the conditional branch instruction predictions are verified. When both the instructions from pipelines U and V reach the EX state, this may happen that one of them will stall and require additional clock cycles for the execution. No work is done during the stall. So the pipeline stall lowers performance. There are various situations when the instructions stall. For example, when the operands required for the operation are not found in the data cache. If the instruction in the U pipeline stalls, the instruction in the V pipeline also stalls. But if the instruction in the V pipeline stalls, the instruction in the U pipeline may continue executing. The instructions in the both pipelines must reach to the last stage, WB before another pair of instructions or the next single instruction may enter the EX stage. 1.9 Instruction P ig Rules The Pentium processor can issue one or two instructions every clock. In order to issue two instructions simultaneously they must satisfy the following conditions : = Both instructions in the pair must be “simple” instructions. Simple instructions are entirely hardwired; they do not require any microcode control and, in general, execute in one clock. Examples of simple instructions are register-to-register MOVs, INC, DEC and near conditional jumps (JZ, JNZ, etc.). There is one more restriction to conditional jump instruction; it must be the second instruction in the pair. The arithmetic and logical instructions are also simple instructions; however, they may take two or three clock cycles because these instructions use both register and memory operands. Sequencing hardware is used Microprocessors and Microcontrollers __1-32 Intro. to Pentium Microprocessor. to allow them to function as simple instructions. The following integer instructions are considered simple and may be paired : 1. MOV reg, reg/mem/imm MOV mem, reg/imm ALU reg, reg/mem/imm ALU mem, reg/imm INC reg/mem DEC reg/mem PUSH reg/mem POP reg, LEA reg, mem 10. JMP/CALL/JCC near 11. NOP Shifts or rotates can only pair in the U pipe. (SHL, SHR, SAL, SAR, ROL, ROR, RCL or RCR) ADC and SBB can only p: SPN ane wD in the U pipe. JMP, CALL and Jcc can only pair in the V pipe. (Jcc = jump on condition code). Neither instruction can contain BOTH a displacement and an immediate operand. For example : mov [bx12], 3; 2 is a displacement, 3 is immediate mov meml, 4 + meml is a displacement, 4 is immediate Prefixed instructions can only pair in the U pipe. Prefixed instructions (such as MOV, AL; ES : [DI]) may only execute in the U pipeline. Therefore, only one prefixed instruction in the pair is allowed. The U pipe instruction must be only 1 byte in length or it will not pair until the second time it executes from the cache. There should not be any data dependencies between them. Data Dependency : The data dependency between two instructions exists if : The result of the first instruction is an operand for the second instruction (read-after-write dependency). That is we can not read the operand from register for the second instruction until first instruction writes its result in the register. Microprocessors and Microcontrollers 4-33 Intro. to Pentium Microprocessor There can be no read-after-write or write-after write register dependencies between the instructions except for special cases for the flags register and: the stack pointer mov ebx, 2 + writes to EBX add ecx, ebx ; reads EBX and ECX, writes to ECX ; EBX is read after being written, no pairing mov ebx, 1 i writes to EBX mov ebx, 2; writes to EBX ; write after write, no pairing The flags register exception allows an ALU instruction to be paired with a Jcc even though the ALU instruction writes the flags and Jec reads the flags. For example : cmp al, 0 ; CMP modifies the flags je addr ; JE reads the flags, but pairs dec cx ; DEC modifies the flags jnz_— loop] ; gNZ reads the flags, but pairs The stack pointer exception allows two PUSHes or two POPs to be paired even though they both read and write to the SP (or ESP) register. push eax : ESP is read and modified push ebx ; ESP is read and modified, but still pairs 1.10 Branch Prediction We have seen that pipelined instruction execution is a valid technique for improving instruction execution rate and hence the performance of a processor. But it reduces when there is a presence of program transfer instructions such as JMP, CALL, RET, the conditional branch instructions etc. in the instruction stream. When the pipelined instruction execution technique is used, the instruction pipeline is always filled with a group of instructions stored in sequential memory locations. But when program transfer instruction is present, it changes the normal sequence of execution. So all the instructions that entered the pipeline after this instructions become incorrect. In this case, the instructions which come in the sequence because of the execution of branch instruction should be loaded in the pipeline. The incorrect instructions that loaded wrongly, must be discarded, This is called ‘flushing’ of the pipeline. After flushing, a new sequence of instructions which is correct because of éxecution of branch instruction, is loaded in the pipeline. No work is done when the pipeline stages are reloaded. These disturbances in the pipelined instruction execution are called ‘bubbles’. Pentium overcomes this problem by using a technique called ‘dynamic branch prediction’. The branch is to be taken or not taken, is decided by prediction. If the prediction is true, the pipeline will not be flushed, no cycles will be lost and no bubbles in the pipeline. If the prediction is false, the pipeline is flushed. So the cycles will be wasted and this causes bubbles in the pipeline. The pipeline is loaded with the correct group of instructions. Naturally, it is best if the predictions are true most of the time. Pentium uses Microprocessors and Microcontrollers. 1-34 Intro. to Pentium Microprocessor a branch target buffer (BTB) for dynamic branch prediction. BTB is a special cache which stores the branch instruction that occurs in the instruction stream and target addresses of it. BTB also stores two history bits which indicate the execution history of the last two branch instructions. BTB uses the history bits to predict whether the branch is taken or not taken. When a new target address is placed into the BIB, these history bits are set to 11. When the corresponding branch instruction is present, the history bits are updated. The history bits become 00 if there are repeated failures to take a branch. Here, the prediction becomes ‘rot taken’, Fig. 1.15 shows the operations take place in dynamic branch prediction technique. aT BNT New branch instructions start here BNT Fig. 1.15 Operations in dynamic branch prediction technique Note : 1. Each state is represented by history bits, H and prection, P. 2. Prediction is either ‘branch is taken’ indicated by BT or ‘branch is not taken’ indicated by BNT. The prediction will be taken until the history bits become both zero."The BTB is accessed during the D1 stage of U and V pipelines. For a new branch instruction, there is no target address in the BTB and the prediction is not taken. There are two. 32-byte buffers. One buffer prefetches instructions from the current program address and the other buffer prefetches instructions from the target address when the BTB’s prediction is ‘branch taken (BT)’. If the predictions are correct, clock cycles are not wasted. If the predictions are incorrect or predictions are correct, but the target address is wrong, the pipelines will be flushed. This looses three clock cycles in the U pipeline and four clock cycles in the V pipeline. Most of the times, we use conditional jumps to form loops in programs. The prediction, ‘branch is taken (BT)’ forms the required multiple passes through a loop. In Pentium, the history bits are set to 11 for a new entry. So, using the dynamic branch prediction, the wastage of clock cycles is minimised. Microprocessors and Microcontrollers___1-35 Intro. to Pentium Microprocessor 1.11 The Instruction and Data Caches In this section, we will see the concept of cache memory, advantages of using caches and Pentium cache organisation. 1.11.1 Cache Memory In a computer system the program which is to be executed is loaded in the main memory (DRAM). Processor then fetches the code and data from the main memory to execute the program. The DRAMs which form the main memory are slower devices. So it is necessary to insert wait states in memory read/write cycles. This reduces the speed of execution. To speed up the process, high speed memories such as SRAMs must, be used. But considering the cost and space required for SRAMS, it is not desirable to use SRAMs to form the main memory. The solution for this problem is come out with the fact that most of the microcomputer programs work with only small sections of code and data at a particular time. In the memory system small section of SRAM is added along with main memory, referred to as cache memory. The program which is to be executed is loaded in the main memory, but the part of program (code) and data that work at a particular time is usually accessed from the cache memory. This is accomplished by loading the active part of code and data from main memory to cache memory. The cache controller looks after this swapping between main memory and cache memory with the help of DMA controller. When processor finds the addressed code or data in the cache, it is called ‘cache hit’. The percentage of accesses where the processor finds the code or data word it needs in the cache memory is called the ‘hit rate’. It is given by, , Number of hits . Hit rate =U imbarof read / write bus cycles * 100% The hit rate is normally greater than 90 percent. When the required code or data is not found in the cache, it is called ‘cache miss’. Thus, cache is a special type of high-speed RAM and is used to speed up accesses to memory and reduce traffic on the processor’s buses. The advanced processors use the on-chip cache to achieve high speed accesses to memory and hence the performance. wmm> Example 1.1: The application program in a computer system with cache uses 1400 instruction acquisition bus cycle from cache memory and 100 from main memory. What is the hit rate? If the cache memory operates with zero wait state and the main memory bus cycels use three wait states, what is the average number of wait states experienced during the program execution? 1400 aoa F100 * 100 = 93.3333% Solution : Hit rate Total wait states = 1400 x 0 + 100 x 3 = 300 Average wait states = Total wait states = 300 _ 99 verage wait states = Njo. of memory bus cycles 1500” Microprocessors and Microcontrollers 1:36 Intro. to Pentium Microprocessor 1.11.2 Two Level Cache System We know that, the on-chip cache is a high-speedcache. But its size is limited by space constraints, Therefore to design a high performance system secondary cache is used along with the primary on-chip cache, called external cache. Such system is called two level cache system and in such systems secondary cache is constructed with SRAM chips. Fig. 1.16 shows a two level cache system in a microcomputer As shown in Fig. 1.16, an on-chip cache supplies instructions and data to the CPU's pipeline. When a code or data is required from memory, the processor first searches it in the on-chip cache (internal cache). If it is found in the internal cache (a cache hit), a copy of it is sent to the pipeline very fastly. Usually, it takes just a clock cycle. If it is not found in the internal cache (a cache miss), the processor examines an external cache (a second level cache). If a cache miss occurs at an external cache or there is no external cache, the processor accesses the main memory. The processor writes the copy of code or data to the cache from main memory. External cache (secondary) On-chip cache (primary) Instructions (codes) Pipeline Fig. 1.16 Two-level cache system in a microcomputer The secondary cache is much slower than the primary cache. But the size of secondary cache is large which ensures a*high hit rate. The secondary cache thus reduces the impact of the main memory speed on the performance of a computer. The average access time experienced by the CPU in a two level cache system is tay = Rita: +(1-hi)hotar +(1—hi)(1-ho) ta where ty) is the access time and hy is the hit rate of Ly tho is the access time and hy is the high rate of Lz t, is access time of main memory. The number of hits in the secondary cache is given by the term (1-h1)h2 and -the number of misses in the secondary cache is given by the term (1—h;) (1—h2). A ‘hit-ratio’ specifies the percentage of hits to total cache accesses. If the hit ratio is 0.9, then it means that the cache contains the requested information nine times out of ten. " Thus, the average access time depends on the hit ratio. The average access time is given by, Tayg = hit-ratio * Teache + (1 — hit-ratio) * (Teache + Tram ) Microprocessors and Microcontrollers 1-37 Intro. to Pentium Microprocessor 11.1.3 Pentium Cache Organisation Pentium processor provides separate caches for data and code. Both caches are organized as two-way set-associative caches with 128 sets. This gives 256 entries per cache. There are 32 bytes in a line (64 bytes per set), resulting in 8 KB of storage per cache. The data and instruction caches may be accessed simultaneously. The Fig. 1.17 shows the internal structure of instruction and data cache. As mentioned earlier, it conists of 128 sets of two lines each. Each line is associated with a tag. The tags are triple ported, meaning that they can be accessed from three different places at the same time. Two of these ports are the U and V pipelines, which access the data cache to read/write instruction operands. The third port is used for a special operation called bus snooping, The code cache tags are also triple ported to support snooping and split line accesses. [The snooping is used to maintain consistent data in a multiprocessor system where each processor has a separate cache]. . Set 126 Set 127 fe— 32bytes —+}-——— 32 bytes. ———+} Fig. 1.17 Structure of 8 KB instruction and data cache Each entry in the data cache can be configured for write back or write through. The code cache is an inherently write-protected cache. It is write protected to prevent self-modifying code from changing the execution program. Each cache uses parity bits to maintain data integrity. Each tag is provided with one parity bit. There is one parity bit for every eight bytes of data (a quarter of a line or entry) in the instruction cache. . In pentium, individual pages can be configured as cacheable or non-cacheable by software or hardware. The caches can be enabled or disabled by software or hardware. Microprocessors and Microcontrollers 1-38 Intro. to Pentium Microprocessor Translation Lookaside Buffers Each cache has a dedicated Translation Lookaside Buffers (TLBs) to translate linear addresses into physical addresses. Physical addresses are used to access the cache because the same address is used to access main memory. The TLBs are also caches. The Table 1.8 gives the information of TLBs in the data cache and instruction cache. TLB in data cache TLB in instruction cache 2 TLBs First : 4-way set associative 64] entries. It translates addresses for 4| KB pages of main memory. Second : 4-way set associative with 8 entries. It translates addresses for 4} MB pages of main memory. Both TLBs are parity protected and| dual ported. 178 4-way set-associative with 32 entries. It translates both addresses for 4 KB pages and 4 MB pages of main memory. 4 MB pages are cached as block of| 4 KB each. TLB is parity protected. The cache convoller uses Least-Recently-Used (LRU) algorithm to replace entries in all three TLBs. For that Pentium provides 3-bit LRU counter for each set. Table 1.8 TLBs for data and instruction cacho The Fig. 1.18 shows the overall cache organisation for Pentium processor. Data cache 2away set associative BKB (82 bytes «2 » 128)| set associative with 64 entries deway set associative with 8 entries ot Instruction cache 2eway set associative aKa (32 bytes x 2 set associative with 32 entres Fig. 1.18 Pentium cache organisation Microprocessors and Microcontrollers__1-39 Intro. to Pentium Microprocessor Translating Linear Address into Physical Addresses with a TLB The Fig. 1.19 shows how TLB is used to translate linear address into. physical address. The upper 20+bits of the linear address are checked against four tags and translated into the upper 20-bits physical address in case of cache hit. The lower 12-bits of the physical address are same as the lower 12-bits of linear address. 20 bits 12 bits Tag of upper 20 bits of linear address. 4-way set associative TLB Upper 20 bits of physical address 31 ‘Actual 4KB page base address 20 bits 12 bits Fig. 1.19 Generation of physical address from linear address Cache Coherency in a Multiprocessor System Cache updating systems eliminates data inconsistency in the main memory caused by cache write operations. However, in multiprocessor systems, several processors require a copy of the same memory block and they store a copy of the same memory block in their individual caches. Now, if the processors are allowed to update the data in the cached memory block in its individual cache, an inconsistent view of memory can result. This problem is known as ‘cache coherence’ problem. To avoid such inconsistency and to maintain cache coherency in its data cache Pentium uses MESI (Modified/Exclusive/Shared/Invalid) Protocol. MESI protocol uses two Microprocessors and Microcontrollers __1-40 Intro. to Pentium Microprocessor bits to keep information of the state of each cache line. The state of each cache line is marked as modified, exclusive, shared or invalid. The meaning of each state in this protocol is as given below. = Modified : The line in the cache, different from main memory is modified and this line is available only in this cache. = Exclusive : The line in the cache is same as that in main memory and it is not present in any other cache. = Shared: The line in the cache is same as that in main memory and the same line may be present in one or more other caches. = Invalid: The line in the cache does not contain valid data. 1.12 Floating Point Unit In 8086, 80286 and 80386, floating point operations were performed with the help of external coprocessors. Table 1.9 gives the list of coprocessors used with 80 x 86 family. Processor | Coprocessor (80 x 86 family) | (80 x 87 family) - 8086/88 8087 80286 | 80287 80386 80387 1.9 Processors and Coprocessors The 80x87 coprocessor, when used with 80x86 processor, shared address bus, data bus and control bus with the processor. A considerable time is required for the synchronization between the processor and the coprocessor to perform floating point operations. This problem was solved by placing coprocessor on the processor chip. This was done for 80486 and Pentium. Since the coprocessor is on the same chip as the processor, communication is faster and execution takes place quickly. Thus, there is an internal floating point unit (FPU) for 80486 and Pentium. The Pentium contains an improved, totally redesigned FPU over that used in the 80486. The number of clock cycles required for many floating point instructions with 80x87 coprocessor units are reduced to few clock cycles in 80486 and Pentium. Also the new algorithms increase the speed of floating point operations. Consider an example of a floating-point multiply instruction, FMUL. Table 1.10 shows the number of clock cycles required for the execution of this instruction for different co-processors. Microprocessors and Microcontrollers 4-41 Intro. to Pentium Microprocessor Coprocessor | Minimum clock (80 x 87 family) | cycles required 8087 130 80287 130 £0387 29 80486 FPU 16 Pentium FPU 1 Table 1.10 FMUL instruction performance Thus, for many floating point instructions, there is an improvement in each generation, and highest improvement in the Pentium’s FPU. Such a high speed of FPU is achieved using a pipeline. A FPU pipeline contains eight stages as shown in Fig. 1.20. i Instruction + and Fig. 1.20 Stages in FPU pipeline As shown in Fig. 1.20, the first five stages are the ones that form the U pipeline, which processes integer instructions. Only difference is in the fifth stage. In U pipeline the fifth stage is WB (Writeback, as discussed earlier). In case of FPU pipeline, this fifth stage becomes the first stage for the floating point execution. The FPU pipeline consists of these five stages and extra three stages. Thus there are totally eight stages. All these stages and their functions are explained below. i) PF: Prefetch ii) D1: Instruction decode iii) D2: Address generation iv) EX : Memory and register read, floating-point data converted into memory format, memory write. The above stages are already explained in the section, pipelining, v) X1-: Stage one in floating point execution. In this stage, memory data is converted into floating-point format, operand is written to floating point register file. Using bypass 1, data is sent back to EX stage. This allows a floating point register write operation in the X1 stage to bypass the floating point register file. The result is sent to the instruction performing a floating-point register read in the EX stage. vi) X2 : Stage two in the floating-point execution. vii) WE : Round floating-point result and write to floating-point register file. Bypass2 path is followed to send data back to EX stage. Using bypass 2, the result of an Microprocessors and Microcontrollers. 1-42 Intro. to Pentium Microprocessor arithmetic instruction in stage WF is made available to the next instruction fetching operands in the EX stage. viii) ER : Error reporting. The status word is updated. There are eight 60-bit floating-point registers in the floating-point register file, ST(0) through ST(7). Two read and two write operations can be performed simultaneously since there are two ports in read section and two ports in write section. The data is written to the two write ports from the X1 and WF stages of the pipeline. Pentium’s FPU is designed such that fast floating-point execution can be achieved. Review Questions 1. Briefly explain the historical evolution of microprocessors. 2. Explain important features of 80286 microprocessor. 3. Explain important features of 80386 microprocessor. 4, Explain important features of 80486 microprocessor. 5. Explain important features of Pentium microprocessor. 6. 7. 8 5 Explain the significant additions and enhancements in the Pentium processor. . Draw and explain the block diagram of Pentium processor. 3. Explain any five Pentium processor signals, 9. Draw the programmer's model of Pentium in real Mode. 10. Give the maximum memory addresses available in real mode. 11. Describe how physical address is obtained in real Mode. 12, Write a short notes on 0) Pentium RISC features. b) Pentium super-scalar Architecture 13. What do you mean by simple instructions ? 14, What is data dependency ? 15, What is pipelining ? 16. Explain the pipelined instruction execution with the help of block diagram. 17. Explain the instruction pairing rules with the help of suitable examples. 18. What is branch prediction ? 19, Explain the dynamic branch prediction technique used in Pentium processors. 20. What is cache memory ? 21. Define hit rate. 2. Explain the two level cache system. 1. Draw and explain the internal structure of instruction and data cache of Pentium processor. 1. What is bus snooping ? What is TLB ? Write a short note on instruction and data cache in Pentium. 7. Draw and explain the Pentium cache organisation. Explain, how linear address is converted into physical address using TLBs. What is cache coherency ? |. How Pentium maintains cache coherency ? |. What is MESI protocol ? Write o' short note on floating point unit of Pentium processor. RESB8SSRRREB Q00 Bus Cycles and Memory and I/O Organisation 2.1 Introduction This chapter gives information about bus cycles and memory and I/O organisation of pentium processor. We begin with the explanation of RESET operation. 2.2 RESET Operation When reset pin of pentium is activated BIST (Build-in Self-Test) for pentium is initiated. The- BIST tests 70 percent of the internal structure of the Pentium in approximately 150 us. Like 80486, in Pentium also the test report is stored in EAX register. The test is OK, if EAX is zero. The value of EAX can be tested after a reset to determine if an error is detected. The table 2.1 shows the values in the various registers of Pentium Processor after reset. Register Reset Value EAK 0 (if test passes) EDX 0500XXXKH EBX, ECX, ESP, EBP, ESI and EDI 0 EFLAGS, 2 EIP COOOFFFOH cs FOOOH DS, ES, FS, GS and SS 0 GDTR and TSS 0 cRO 60000010H CR2, CR3 and CR4 o DRO-DR3 o DRE FFEFFOFFOH DR7 00000400H Table 2.1 Register values in Pentium processor after reset (2-1) Microprocessors and Microcontrollers 2-2 Bus Cycles & Memory & 1/0 Org. 2.3 Bus Operations and Bus Cycles The pentium processor performs following different operations over its address and data bases : = Data transfers (both single cycle and burst transfers). = Interrupt acknowledge cycles. = Inquire cycle for examining the internal code and data cache. = 1/O cycle. In this section, we are going to discuss the basic operations and the purpose of these bus cycles. The current bus cycle in the pentium processor is decided by the state of M/IO, D/C, W/R, CACHE and KEN signals. This is illstrated in Table 2.2. M/IO | DIC | WIR |CACHE| KEN Cycle description 0 ° ° 1 x__|_ Interrupt acknowledge 0 0 1 1 x Special cycle 0 1 o 1 x__| 0 read non-cached ° 1 1 1 x__| wo wrte non-cached 1 ° ° 1 x__| Code reed 8 bytes non-cached 1 ° ° x 1 _| Code read 8 bytes non-cachad 1 0 o o 0 | Code read 32 bytes burst cached 1 1 0 1 x Memory read up to 8 bytes non-cached 1 1 0 x 1 Memory read up to 8 bytes non-cached 1 1 ° 0 0 _| Memory read 32 bytes burst cached 1 1 1 1 x Memory write up to 8 bytes non-cached 1 1 1 oO x 32 byte cache write back burst Note : X = don't care Table 2.2 Bus Cycle Encodings Additional decoding is required to indicate special bus cycles. The byte enable outputs decide the currently running special bus cycle, as shown in the Table 2.3. BE, BE. BE BE, BE BE BE Special Bus Cycle 1 1 1 1 1 1 ° Flush cache 1 1 1 1 1 o 1 Halt | Be Tp to te snaoun Ls 1 [4 Writeback Microprocessors and Microcontrollers ___2-3 Bus Cycles & Memory & /0 Org. +[i [a 0 1 1 1 1_| Flush acknowledge oa tre Po tr a oT ere eve mosses Table 2.3 Special bus cycles 2.4 Bus Cycle States The state of pentium bus cycle is depend upon the type bus cycle is being processed. There are six possible states for pentium bus cycle. These are : Tj (Idle state) : After hardware reset pentium bus is in idle state. In this state, no bus cycle is currently running. T; (First state) : This is the first state of the bus cycle. During T,, a valid address is output on the address lines and ADS is activated. T, (Second state) : This is the second state of the bus cycle. During T;, data is read or written and the BRDY input is examined. Tua : It indicates the overlapping period of first and second states. This state exists when a second bus cycle starts before the first one completes. The data for the first bus cycle is transferred, and a new address is output on the address lines. Tp : State inserts a dead state between two consecutive cycles. The Fig. 2.1 shows the state transition diagram for pentium processor. No bus cycle request Bus cycle complete Go for Go back if new bus cycle new request is pending Current cycle completed No deadiook lock needed New bus needed cycle is Deadlock needed after | Not pending ‘completion of current Deadlock needed after completion of current cycle Fig. 2.1 State transition diagram for pentium processor Microprocessors and Microcontrollers __2-4 Bus Cycles & Memory & 1/0 Org. 2.5 Non-Pipelined Bus Cycles Fig. 2.2 Typical nonpipelined bus cycle Fig. 22 shows typical nonpipelined bus cycle. During T1, the pentium sends the address, bus status signal and control signals. In case of write cycle, data to be output is also send on the data bus, during TI. As shown in the figure, after address access time read or write data transfer takes place over the data bus. This activity is carried out in T2. 2.5.1 Non-pipelined Read Cycle Fig. 23 (please see on next page) shows the timings for two nonpipelined read cycles (with and without a wait state). First read cycle is without wait state and second cycle is with wait state. The sequence of events for the nonpipelined read cycle is as follows : The read operation starts at the beginning of phase in the T1 state of the bus cycle. In this phase, Pentium sends the address on the address bus and enables signals according to data transfer type. After sending the address, in the same phase, Pentium activates its ADS signal to indicate valid address is placed on the address bus. In phase 1 of Ti-state Pentium also activates the bus cycle definition signals. For read cycle W/R is low. M/IO is high for memory read and low for an I/O read. D/C signal differentiate between data and instruction code. This signal is high if data is to be read and low if an instruction code is to be read. At the end of phase 2 of Tl-state, ADS is returned to its inactive logic 1 state. The address bus, byte enable pins, and bus status pins remain active through the end of the read cycle. At the end of phase 2 of T2-state the BRDY signal is sampled by the Pentium. The Pentium. The logic 1 on this signal inserts wait state in the current bus cycle to extend the bus cycle. In wait state (Tw), the signals from T2-state are maintained throughout the wait state period. It just a repetition of T2-state. Thus the period of one wait state (Tw = T2) is equal to 50ns of 20 MHz clock operation. Microprocessors and Microcontrollers 25 Bus Cycles & Memory & I/O Org. CYCLE 1 cYcLe2 IDLE | NON-PIPELINED| —_NON-PIPELINED IDLE (READ) (READ) cuk ‘Address wR ADS ROY Fig, 2.3 Non pipelined read cycle 2.5.2. Non-pipelined Write Cycle Fig. 2.4 (please see on next page) shows the timings for two nonpipelined write cycles (with and without a wait state) first write cycle is without wait state and second cycle is with wait state. The sequence of events for the nonpipelined write cycle is as follows : = The nonpipelined write cycle is similar to nonpipelined read cycle. The write operation starts at the beginning of phase 1 in the T1 state of the bus cycle. In this phase, Pentium sends the address on the address bus and enables signals according to data transfer type. After sending address in the same phase, Pentium activates its ADS signal to indicate valid address is placed on the address bus. In phase 1 of Ti-state Pentium also activates the bus cycle definition signals. For write cycle, W/R is high. M/IO is high for memory and low for I/O write. D/C signal is high. = At the beginning of phase 2 of Ti-state, Pentium sends data on the data bus. This data remains valid until the start of phase 2 of the Tl-state of the next bus cycle. Microprocessors and Microcontrollers ___2-6 Bus Cycles & Memory & /0 Org. cycte 1 CYCLE 2 IOLE |NON-PIPELINED] —NON-PIPELINED | IDLE, (WRITE) (WRITE) cK Address Fig. 2.4 Nor-pipelined write cycle = At the end of phase 2 of T1 - state, ADS is returned to its inactive logic 1 states. The address bus, byte enable pins, and bus status pins remain active through the end of the write cycle. At the end of phase 2 of T2-state the BRDY signal is sampled by the Pentium. The logic 1 on this signal inserts wait state in the current bus cycle to extend the bus cycle. In wait state (Tw), the signals from T2-state are maintained throughout the wait state period. It just a repetition of T2-state. 2.6 Pipelined Read/Write Cycle As mentioned earlier, address pipelining allows bus cycles to be overlapped, increasing the amount of time available for the memory or I/O device to respond. Fig. 6.10 shows both nonpipelined and pipelined read and write cycles. The cycle’l and cycle 2 in the diagram show nonpipelined write and read cycles, respectively, whereas cycle 3 and cycle 4 in the diagram show pipelined write and read cycles, respectively. This diagram also shows how wait state can be avoided using pipelined bus cycle. In the pipelined bus cycle the address for the next bus cycle is sent during the T2 - state of the current cycle. In Pentium, NA (next address) signal initiates address Microprocessors and Microcontrollers 2-7. Bus Cycles & Memory & I/O Org. pipelining. The Pentium samples NA signal at the beginning of phase 2 of any T state in which ADS is not active, specifically. = In the second T-state of a non-pipelined address cycle = In the first T-state of a pipelined address cycle = In any wait state of a non-pipelined address or pipelined address cycle unless NA has already been sampled active. In Fig. 2.5 NA is tested as 0 (active) during T2 of cycle 2 which ensures that Pentium has to execute next cycle as pipelined bus cycle. The cycle 2 (nonpipelined read cycle) is also extended with one wait state because BRDY pin is not active, in wait state, the valid address for the next bus cycle is sent on the address bus as next bus cycle is pipelined bus cyde. oveues covets eyes | crea WoLe NONMPELINeD nowHiiven | eiPetneD | Pein’ | ie ‘wre FAD TE EAD VALIO4 VALID2 Fig. 2.5 Pipelined Read/Write Cycle The next cycle (cycle 3) is pipelined write cycle. In this, data is sent on the data bus in phase 2 of Tip-state and remains velid for the rest of the cycle. The BRDY signal is sampled at the end of T2p - state. As it is low, write cycle is completed without wait state. Fig. 2.5 shows NA is active during Tip of cycle 3, which ensures that Pentium has to execute next cycle as pipelined bus cycle. ‘The next cycle (cycle 4) is pipelined read cycle. In this, BRDY signal is tested 0 at the end of phase 2 of T2p - state. This means that read cycle is completed without wait state. It is important to note that due to pipelined address cycle access time is extended and one state (T-wait) of read cycle is saved. Microprocessors and Microcontrollers___2-8 Bus Cycles & Memory & 1/0 Org. 2.7, Burst Cycle In Pentium, the memory data can be read using burst cycle. It is the most efficient way of accessing data. The burst cycle in the Pentium transfers four 64-bit numbers per burst cycle in five clocking periods. This is illustrated in Fig. 2.6. Therefore, a brust cycle without wait states requires average ({15.15 ns x 5]/4) 1894 ns for each memory data transfer. Fig. 2.6 Burst cycle for Pentium processor 2.8 Memory Organisation The memory system for the Pentium processor is 4 Gbytes in size, same as in the 80386DX and 80486 microprocessors. The only difference is in the width of the memory data bus. The Pentium uses a 64-bit data bus to address memory organised in eight banks that each contain 512 Mbytes of data. This is illustrated in Fig. 2.7. As shown in the Fig. 2.7, the pentium memory system is divided into eight banks that each store a byte of data with a parity bit. The memory system has 4 Gbytes memory, beginning of location 00000000H and ending at location FFFFFFFFH. The Bank selection is accomplished by bank enable signals (BF7-BE0), one for each bank. These ‘separate memory banks allow the Pentium to access any single byte, word, double word or quad word with one memory transfer cycle. Please refer Fig. 2.7 on next page. Microprocessors and Microcontrollers __2-9 Bus Cycles & Memory & /O Org. BE, BE, BES Bank 7 Bank 6] |p] Bank 5 2 sramxa||lstamxa} || szmsa t y Fig. 2.7 Pentium memory system organisation In Pentium, the double-precision floating point number can be retrieved in one read cycle because a double-precision floating point number is, 64-bit’Wide and data bus of pentium is also 64 bit wide. t s1zmx | |] 512m x 3| P| P P a a a r r r t t t y y y The pentium has an ability to check and generate parity for the address bus (As, - As) during certain operations. The AP signal provides the system with parity information and the APCHK indicates a bad parity check for the address bus. The Pentium takes no action when an address parity error is detected. Therefore, in Pentium the.error must be assessed by the system and the system must take appropriate action (an interrupt), if so desired. 2.9 1/O Organisation Input/Output devices can be interfaced with Pentium systems in two ways. 1. 1/O mapped 1/0 2. Memory mapped 1/0 1/O Mapped 1/0 In I/O mapped I/O, the I/O devices are treated separate from memory. The Pentium supports software and hardware features for separate memory and I/O address spaces. Fig. 2.8 shows the memory and I/O spaces in real mode. Please refer Fig. 2.8 on next Page. The Pentium has four special instructions IN, INS, OUT, and OUTS to transfer data through input/output ports in I/O mapped I/O system. M/IO signal is always low when Pentium is executing these instructions. So M/TO signal is used to generate separate addresses for Input/Output. Only 256 (2*) 1/O addresses can be generated when direct addressing method is used. By using indirect addressing method this range can be extended upto 65536 (2!*) addresses. Microprocessors and 2-40 Bus Cycles & Memory & 1/0 Org. FFFFF 6 Memory address space FFFF : VO address space 000005 FFFF 4g VO address space Page 0 Port 0 (32 - bit port) Port 0 (16 - bit port) Fig. 2.8 Memory and VO spaces in real mode Microprocessors and Microcontrollers___2-14 Bus Cycles & Memory & 1/0 Org. 2.9.2 Memory Mapped 1/0 In memory mapped I/O, I/O device is placed in the memory address space of the “microcomputer system. I/O device is connected as if it is a memory location. For this reason, the method is known as memory mapped I/O. In a microcomputer system with memory-mapped I/O, some of the memory address spaces are dedicated to the I/O system. Fig. 29 shows memory mapped I/O devices in the Pentium memory address space. Here, 4096 memory addresses from DOOOOH through DOFFFH are assigned to I/O devices. The contents of DOQO0H represents byte wide I/O port 0; contents of DOOOOH and DOOO1H represents word-wide port 0; and contents of OOOOH through D0003 H represents double world wide port 0. FFFFFy_ Port 4095 Port 0 (32 - bit por! Porto por) (16 - bit port) UO addresses Fig. 2.9 Memory mapped W/O devices The I/O system for Pentium is identical to the 80386 microprocessor and it is completely compatible with earilier Intel microprocessors. In Pentium, the I/O port number appears on address lines A,; - A; with the bank enable signals used to select the actual memory banks used for the I/O transfer. Like 80386 microprocessor, the I/O privilege information is added to the TSS segment when the Pentium is operated in the protected mode. This provides I/O protection and allows 1/O ports to be selectively inhibited. If the inhibited 1/O location is accessed, the Pentium generates a type 13 interrupt to indicate the I/O privilege violation. Microprocessors and Microcontrollers ___2-12 Bus Cycles & Memory & 1/0 Org. 2.10 Data Transfer Mechanism - 8-bit, 16-bit, 32-bit and 64-bit / . Address Translation The Pentium's address bus is designed to address 64-bit devices. It consists of Ay: Ay and BE, : BE, signals. Signals BE, to BE, are used to select eight data bytes to get 64-bit data bus. But in PC environment all devices are not 64-bit devices. In such cases, address requirements depend on device sizes, as listed below : 32-bit devices : Aj;-Az and BE; : BEy 16-bit devices : AyyA, and BHE and BLE B-bit devices : Ay :Ay Pentium does not support these” address requirements. The extemal logic is required for this address translation (Refer Fig. 2.7). The address translation is typically done in the expansion bus control logic for smaller devices that are integrated onto the system board or residing in expansion slots. Data Bus Steering The external logic must also ensure that information read from and written to 8-, 16-, and 32-bit devices be transferred over the correct data path(s) (Refer Fig. 210). Fig. 2.10 Address translation for 8, 16, and 32-bit devices Microprocessors and Microcontrollers 213 Bus Cycles & Memory &/O Org. Since smaller devices such as 8-bit devices connect to the lower data paths only and since the Pentium processor when reading from a device expects data from given locations to be transferred over their respective data paths, data from a specified address location must be directed or steered to the path over which the Pentium processor expects it. Conversely, when the Pentium processor writes data to a device, it assumes that the device is connected to all 8 data paths (that is, a 64-bit device). However, if the device is smaller that 64-bits the data paths used by the Pentium processor may not connect to the smaller devices, and again the data must be steered to the correct path. This is implemented with a series of transceivers that can pass data from one path to another. Data Bus Steering for 8-Bit Devices Fig. 2.11 shows the data bus steering logic required for 8-bit devices. As shown in the Fig. 2.11 an 8-bit device connects only to the lower data path (SD, : SD,). Host/Processor Data Bus Fig. 2.11 Data bus steering transceivers required by 8-bit devices Microprocessors and Microcontrollers 2-14 Bus Cycles & Memory & V/0 Org. Let us consider the instruction, MOV EBX, [A004H] and assume that the memory device is 8-bit. Since the destination register (EBX) is 32-bit, it is necessary to retrieve 4 bytes from the 8-bit memory device. To execute this instruction, the processor runs a single memory read bus cycle to get the contents of the four locations starting at memory location AOO4H. The processor has no idea that the memory device being accessed is an 8-bit device. To satisfy the processor's request the external logic has to activate multiple bus cycles to access four bytes of data and inactivate BRDY signal to keep processor waiting for the bus cycles to complete. Address translation logic uses byte enable signals to generate Ay, A, and A, which are required to address the 8-bit device. The byte enable signals also specify the data path over which the Pentium processor expects the data (In our case, paths are 4, 5, 6 and 7). Accessing First Byte ‘Address translation logic converts the quadword address output by the Pentium processor to a byte address (A004H) required by the 8-bit device. When the 8-bit device is ready to complete the first transfer, the bus control logic activates the steering logic (Path 0/4 Transceiver) to transfer the contents of data path 0 to data path 4. The data is then latched into the latch on the data path 4 and the steering logic is disabled. Accessing Second Byte The bus control logic increments the address to select the next location (AQ0SH) from the 8-bit memory device. Again 8-bit device delivers data on path 0. Now this data is transferred through path 0/5 transceiver and latched on path 5 latch by the steering logic. The steering logic is then disabled. Accessing Third Byte Again the bus logic increments the address to select the next location (AQQ6H). This time the steering logic directs the data accessed from the 8-bit device to data path 6 by activating the path 0/6 transceiver and then latching data on path 6 latch. Accessing Fourth Byte Same process is repeated. In this case the device address is A007H and the data is latched on path 7 latch. When all four bytes of data are present on data path 4 through 7, the bus control logic asserts the BRDY signal, telling the processor that the valid data is present on the data buses. The processor then restarts its bus cycle and latches the contents of data path 4 through 7 and ends the bus cycle. Microprocessors and Microcontrollers __ 2-15. Bus Cycles & Memory & /0 Org. Data Bus Steering for 16-Bit Devices Fig. 2.12 shows data steering logic required by 16-bit device. As shown in the Fig. 2.12, 3-transceivers are used to transfer data from lower byte (byte 0) of 16-bit device on ‘paths 2, 4 and 6 and another 3-transceivers are used to transfer data from upper byte (byte 1) of 16-bit device on paths 3, 5 and 7. The similar process (process described for data steering of &-bit device) is repeated. If we consider previous example, (MOV, EBX, A004H) 16-bit device requires only two accesses instead of four since each transfer results in two bytes of transfer. 16-bit Device Fig. 2.12 Data bus steering transceivers required by 16-bit devices Microprocessors and Microcontrollers 2-16 Bus Cycles & Memory & I/O Org. Accessing First and Second Bytes Address translation logic converts the quadword address sent by. the Pentium processor to a word address required by the 16-bit device. It also generates BHE and BLE signals required to access 16-bit device. In our case, (MOV/EBX, A004H), conversion results word address as AO04H with BHE and BLE signals asserted low. Due to this, 16-bit device delivers 16-bit contents; Lower byte contents of address AQ04H over data path 0 and upper byte contents of address AQOSH over data path 1. At this time, bus control logic activates the data bus steering logic (path 0/4 and path 1/5 transceivers) to transfer the contents of data paths zero and one to data paths four and five respectively. (Refer Fig. 2.12). The data is then latched into latches on data paths four and five and the steering logic is disabled. Accessing Third and Fourth Bytes The bus control logic increments the address to select locations AQ06H and A007H from the 16-bit device. With both BHE and BLE signals asserted, data from 16-bit device is delivered over paths zero and one. The steering logic then activates transceivers 0/6 and 1/7 to transfer the contents of path zero and one. These contents are then latched into corresponding latches. (Refer Fig. 2.12) When all four bytes of data are present on data path 4 through 7, the bus control logic asserts the BRDY signal, telling the processor that the valid data is present on the data buses. The processor then restarts its bus cycle and latches the contents of data path-4 through 7 and ends the bus cycle. Data Bus Steering for 32-bit Devices Fig. 213 shows the data bus steering logic required when Pentium processor access 32-bit device. As shown in the Fig. 2.13 one transceiver is used to transfer data from each byte to corresponding higher byte data path. We know that, the 32-bit device is capable of accessing all jour bytes within a single bus cycle. However, in the example (MOV EBX, A004H), the data from locations A004H through AQO7H is delivered over data paths zero through three, while the Pentium expects the data to be over data paths four through seven. The Address translation logic converts quadword address sent by the Pentium into doubleword address. It also generates BE, through BE; signals to access 32-bit device. In our case (MOV EBX, A004H), conversion results double word address as AOO4H with BE, through BE, signals asserted low. Due to this 32-bit device delivers 32-bit contents over data paths 0 through 4. The bus control logic then activates bus steering logic to transfer all four bytes to the upper paths. The contents of all four bytes are then latched on the corresponding latches. When all four bytes of data are present on data path 4 through 7, BRDY signal is asserted and processor completes its bus cycle after latching the contents from data path 4 through 7. 2-17 Microprocessors and Microcontrollers Bus Cycles & Memory & /0 Org. Pan Trove 3 @ Data “5 $05,503 Path Rath DSO System Data Bus 32-bit Device Fig. 2.13 Data bus steering transceivers required by 32-bit devices Microprocessors and Microcontrollers 2-18 Bus Cycles & Memory & I/O Org. Review Questions PONAAAHSNH 10. 11 12, 13. 14. 15. Give the contents of various registers of pentium processor immediately afier hardware reset. Write a short note on bus operations of pentium processor. Explain various bus cycle states of pentium processor. Draw the state transition diagram for pentium processor. What is non-pipelined ond pipelined bus cycle ? Explain the non-pipelined read cycle with the help of timing diagram. Explain the non-pipelined write cycle with the help of timing diagram Explain the pipelined read cycle with the help of timing diagram. Explain the pipelined urite cycle with the help of timing diagram Explain burst cycle of pentium processor. Write a short notes on - a) Memory organisation of pentium processor ) 1/0 organisation of pentium processor. What is data bus steering ? Exploin the data bus steering for 8-bit devices. Explain the dota bus steering for 16-bit devices. Explain the data bus steering for 32-bit devices. aaa Pentium Programming 3.1 Introduction In this chapter we are going to study the programming environment of Pentium processor. It includes study of programmer's model, register set, addressing modes datatype and instruction set supported by pentium processor. 3.2 Programmer's Model The programming model makes it easier to understand the processor in a programming environment. Pentium processor can be operated in three basic modes — real mode, protected mode and virtual 8086 mode. We have introduced to real mode programmer's model of pentium processor in chapterl. The protected mode programmer's model of pentium processor includes some more registers. Let us study the protected mode programmer's model of pentium processor with detail description of each register. The Fig. 3.1 shows the Programm’s Model for pentium processor. In the figure, only the shaded portion is a part of real mode. It consists of eight 16-bit registers (IP, CS, DS, SS, ES, FS, GS and Flag register) and eight 32-bit registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI). In real mode, Pentium can access CRO, which is used to enter into the protected mode. The Protection Enable bit (PE) is used to switch the Pentium from real to protected mode, The registers in the programmer's model of pentium processor can be categories according to their usage as given below. 1. General purpose registers Segment registers Index, pointers and base registers Flag registers System address registers Control registers Debug registers PN a eR ED Test Registers @-1) Microprocessors and Microcontrollers Pentium Programmin (CODE SEGNENT (cs) eee evres STACK SEGHENT (65) oak BYTES DATASEGMENT (F5) “aK BYTES DOATASEGNENT (6S) FFFF ys FFFFF 9 ‘Mota: Shaded reisterhaicates real mode model cf Pentium processor Fig. 3.1 Programmers model of Pentium processor Microprocessors and Microcontrollers 3-3 Pentium Programming 3.2.1 General Purpose Registers The Pentium contains 32-bit general purpose registers EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP to hold the following items: = Operands for logical and arithmetic operations = Operands for address calculations = Memory pointers Although all of these registers are available for general storage of operands, results, and pointers, caution should be used when referencing the ESP register. The ESP register holds the stack pointer and as a general rule should not be used for another purpose. Many instructions assign specific registers to hold operands. For example, string instructions use the contents of the ECX, ESI, and EDI registers as operands. When using a segmented memory model, some instructions assume that pointers in certain registers are relative to specific segments. For instance, some instructions assume that a pointer in the EBX register points to a memory location in the DS segment. Fig. 3.2 shows the general purpose registers in Pentium. The lower 16 bits of each of the general purpose register can be accessed individually. These 16-bit registers are accessed as AX, BX, CX, DX, SP, BP, SI, and, DI respectively. The AX, BX, CX and DX registers can be further divided into two separate bytes : Higher byte and lower byte. For example : AX « AH + AL. These bytes can be individually accessed as AH, AL, BH, BL, CH, CL, DH, and DL. Note : Register name beginning with an E (For example : EAX) indicate register width is 32-bit. Register name ending with an X (for example AX) indicate a 16-bit register and register name ending with H or L (for example AH or AL) indicate it is an 8-bit register. The other four general purpose registers, are the two pointer registers, ESP and EBP, and the two index registers, ESI and EDI. These registers are used to do special functions. They are used to store offset addresses of memory locations relative to the segment registers. The index registers ESI and EDI are used to store offset values to be incremented or decremented when stepping through block of data. The index registers are also used to hold offset addresses for instructions that access data stored in the data segment part of memory. Thus these registers can be combined with the values in the DS register using index addressing. The pointer register ESP and EBP are used to store offset addresses of mémory locations relative to the stack segment register. The summary of special uses of general purpose registers is as follows : EAX — Accumulator for operands and results data EBX — Pointer to data in the DS segment ECX — Countet for string and loop operations EDX — I/O pointer Microprocessors and Microcontrollers 3-4 Pentium Programming ESI — Pointer to data in the segment pointed to by the DS register; source pointer for string operations EDI — Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations ESP — Stack pointer (in the SS segment) EBP — Pointer to data on the stack (in the SS segment) Bit Bit a 0 Bit Bit 15 0 Bit Bit Bit Bit 15 a7 0 I EAX : AX AH AL, EBX B) BH BL ECX ox CH cL EDX Dx OH DL For E (Reg) For (Reg) X For (Reg) H For (Reg) L EAX (Accumulator) AX AHAL EBX (Base) Bx BHBL ECX (Count) cx CHCL EDX (Data) DX DHOL ESP (Stack Pointer) sP EBP (Base Pointer) BP ESI (Source Index) si EDI (Destination Index) or Fig, 3.2 General purpose registers Microprocessors and Microcontrollers 35 Pentium Programming 3.2.2 Segment Registers The segment registers (CS, DS, $5, Bit 15 Bio ES, FS, and GS) hold 16-bit segment selectors. A segment selector is a cs Code segment special pointer that identifies a Ds Datasegment segment in memory. To access a ss Stack segment particular segment in memory, the ES Extrasegment segment selector for that segment FS Extrasegment must be present in the appropriate GS Extrasegment — segment register. Fig. 3.3 shows the segment registers. Fig. 3.3 Segment registers 1° The CS (Code Segment) register holds the base address of the currently active code. segment. . 2. The DS (Data Segment) is used to hold the address of currently active data segment. 3, The ES (Extra Segment), FS, and GS are used as general data segment registers. These registers hold the base addresses. of three different memory segments. These segments are referred as to Extra Segments. 4, The base address of the currently active stack segment is contained in the SS (Stack Segment) register. 3.2.3 Index, Pointers, and Base Registers As mentioned earlier, the physical address of any memory location within a selected memory segment is obtained by adding the segment address and the offset (The contents of segment register are shifted left by 4 and the offset is added to the shifted contents of segment register to generate physical address. The offset used to calculate physical address is contained in any of the pointer, base, or index registers. The Table 3.1 shows the segments and offset registers used with the corresponding segments. Segment Register Offset Registers CS (Code Segments) (€) IP (Instruction Pointer) ‘8S (Stack Segments) (€) SP (Stack Pointer) (€) BP (Base Pointer) DS (Data Segments) (€) BX (Base Register) (E) SI Source Index Register {E) DI Destination Index Register ES, FS and GS (Extra Segments) (€) BX (Base Register) (€) SI Source index Register Table 3.1 Segments and offset registers Microprocessors and Microcontrollers 3-6 Pentium Programmin, 3.2.4 EFLAGs Register A Flag is a flip-flop which indicates some condition produced by the execution of an instruction or controls certain operations of the EU. The 32-bit EFLAGS register contains a group of status flags, a control flag, and a group of system flags. The Fig. 3.4 defines the flags within this register. FLA oS a 351312 3 Reserved (0900000000) sc ID Flag Virtual Interupt Pending Virtual Interrupt Flag Alignment Check Virtual 8086 Mode Resume Flag Nested Task 110 Privilege Levat Overflow Flag Direction Flog Interrupt Enable Flag Trap Fleg Sign Flag Zero Flag Auxiliary Cary Flag Parity Flag Cary Flag ‘Note: all bits shown with a one or a zero are Intel reserved. They must aways be set to the values previously read from them, Fig. 3.4 EFLAGs register These flags can be categorized in three different groups : 1. Status flags: These flags reflect the state of a particular program 2. Control flags : These flags directly affect operation of few instructions. 3. System flags : These flags reflect the current status of the machine and which are usually used by operating system than by application programs. 3.2.4.1 Status Flags The status flags are : CF (Carry flag), PF (Parity flag ) AF (Auviliary carry flag), ZF (Zero flag), SF (Sign flag), and OF (Overflow flag). These flags indicate some condition produced by the execution of arithmetic or logical instructions. These flags provide necessary information for arithmetic and logical control decisions. Microprocessors and Microcontrollers 37 Pentium Programming CF (Bit 0) Carry flay This bit is set by arithmetic instructions that generate either a carry or a borrow. This bit can also be set, cleared, or inverted with the STC, CLC or CMC instructions, respectively. Carry flag is also used in shift and rotate instructions to contain the bit shifted or rotated out of the register. PF (Bit 2) Parity flag : The parity bit is set by most instructions if the least significant 8 bit of the result contain even number of one’s. AF (Bit 4) Auxiliary carry flag : This bit is set when there is a carry or borrow after a nibble addition or subtraction, respectively. The programmer can’t access this bit directly, but this bit is internally used for BCD arithmetic. ZF (Bit 6) Zero flag : Zero flag is set to 1, if the result of an operation is zero. SF (Bit 7) Sign flag : The signed numbers are represented by combination of sign and magnitude. The most significant bit (MSB) indicates sign of the number. For negative number MSB is 1. Sign flag is set to 1, if the result of an operation is negative (MSB = 1). OF (Bit 11) Overflow flag : In 2's complemented arithmetic, most BL significant bit is used to represent sign and remaining bits are used to represent magnitude of a number (see Fig. 3.5). This flag is set if the result of a signed operation is too large to fit in Fig. 3.5 Sign and magnitude representation the number of bits available (7-bits for 8-bit number) to represent it. For example, if you add the 8-bit signed number 01110110 (+118 decimal) and the 8-bit signed number 00110110 (+54 decimal). The result will be 10101100 (+172 decimal), which is correct binary result. But in this case, it is too large to fit in the 7 bits allowed for the magnitude in an 8-bit signed number. The overflow flag will be set after this operation to indicate that the result of the addition has overflowed into the sign bit. 3.2.4.2 Control Flags DF (Bit 10) ( Direction flag) : 3s——~ Magnitude The direction flag controls the direction of string operations. When the D flag is cleared these operations process strings from low memory up towards high memory. This means that offset pointers (usually Si and Dil) are incremented by 1 after each operation in the string instructions when D flag is cleared. If the D flag is set, then SI and DI are decremented by 1 after each operation to process strings from high to low memory. Microprocessors and Microcontrollers___3-8 Pentium Programming 3.2.4.3 System Flags YM (Bit 17) Virtual Memory flag : This flag indicates operating mode of pentium. When VM flag is set, pentium switches from protected mode to virtual 8086 mode. RF (Bit 16) (Resume) flag : This flag, when set allows selective masking of some exceptions at the time of debugging, NT (Bit 14) (Nested flag) : ‘This flag is set when one system task invokes another task. (i.e. nested task). IOPL (bits 12 and 13) I/O Privilege level : The two bits in the IOPL are used by the processor and the operating system to determine your application's access to I/O facilities. It holds privilege level, from 0 to 3, at which the current code is running in order to execute any I/O related instruction. IF (Bit 9) Interrupt Flag : When interrupt flag is set, the pentium recognizes and handles external hardware interrupts on its INTR pin. If the interrupt flag is cleared, pentium ignores any inputs on this pin. The IF flag is set and cleared with the STI and CLI instructions, respectively. TF (Bit 8) Trap Flag : Trap flag allows user to single-step through programs. When an pentium detects that this flag is set, it executes one instruction and then automatically generates an internal exception 1. After servicing the exception, the processor executes the next instruction and repeats the process. This single stepping continues until program code resets this flag. for debugging programs single step facility is used. AC (bit 18) Alignment Check Flag : Alignment checking of memory references can be enabled by setting AC flag along with the AM bit in the CRO register. Alignment checking of memory references is disabled when either the AC flag and/or the AM bit is cleared. ID (bit 21) Identification flag : The ability of a program to set or clear this flag indicates support for the CPUID instruction. VIF (bit 19) Virtual interrupt flag : Virtual image of the IF flag. Used in conjunction with the VIP flag. (To use this flag. and the VIP flag the virtual mode extensions are enabled by setting the VME flag in control register CR4.) Microprocessors and Microcontrollers __3-9 Pentium Programming VIP (bit 20) Virtual interrupt pending flag : Set to indicate that an interrupt is pending; clear when no interrupt is pending. (Software sets and clears this flag; the processor only reads it.) Used in conjunction with the VIF flag, 3.2.5 More about EFLAGs Following the initialization of the processor (either by asserting the RESET pin or the INIT pin), the state of the EFLAGS register is 00000002H. Bits 1, 3, 5, 15, and 22 through 31 of this register are reserved. Software should not use or depend on the states of any of these bits. Some of the flags in the EFLAGS register can be modified directly, using special-purpose instructions (described in the following sections). There are no instructions that allow the whole register to be examined or modified directly. The following instructions can be used to move groups of flags to and from the procedure stack or the EAX register: LAHF, SAHF, PUSHF, PUSHFD, POPF, and POPFD. After the contents of the EFLAGS register have been transferred to the procedure stack or EAX register, the flags can be examined and modified using the processor's bit manipulation instructions (BT, BTS, BTR, and BTC). When suspending a task (using the processor’s multitasking facilities), the processor automatically saves the state of the EFLAGS register in the task state segment (TSS) for the task being suspended. When binding itself to a new task, the processor loads the EFLAGS register with data from the new task’s TSS. When a call is made to an interrupt or exception handler procedure, the progessor automatically saves the state of the EFLAGS registers on the procedtire stack. When an interrupt or exception is handled with a task switch, the state of the EFLAGS register is saved in the TSS for the task being suspended. 3.2.6 System Address Registers There are four system address registers : TR (Task Register), IDTR (Interrupt Descriptor Table Register), GDTR (Global Descriptor Table Register) and LDTR (Local Descriptor Table Register). Fig. 3.6 shows these special registers which are used in protected mode. These registers hold the addresses for the four special descriptor table segments. The TR (Task Register) points to the Task state segment. The IDTR (Interrupt Descriptor Table Register) points to the Interrupt Descriptor Table (IDT). The GDTR (Global Descriptor Table Register) points to the Global Descriptor Table (GDT). The LDTR (Local Descriptor Table Register) points to the local Descriptor Table (LDT). Microprocessors and Microcontrollers___3-10 Pentium Programming 47 15 0 GDTR IDTR 15 ° LOTR 0 TR Fig. 3.6 Protected mode registers 3.2.7 System Registers To assist in initializing the processor and controlling system operations, the system architecture provides system flags in the EFLAGS register and several system registers. These include control registers, debug registers, test registers and model-specific registers. The control registers (CRO, CR2,CR3, and CR4) contain a variety of flags and data fields for controlling system-level operations. Other flags in these registers are used to indicate support for specific processor “capabilities within the operating system or executive. The debug registers allow the setting of breakpoints for use in debugging programs and systems software. The task register contains the linear address and size of the TSS for the current task. The model-specific registers (MSRs) are a group of registers available primarily to operating system ot executive procedures (that is, code running at privilege level 0). These registers control items such as the debug extensions, the performance-monitoring counters, the machine check architecture, and the memory type ranges (MTRRs). 3.2.7.1 Control Registers Control registers determine operating mode of the processor and the characteristics of the currently executing task. These registers are 32bits in all 32-bit modes and compatibility mode. There are five control registers : CRO, CR1, CR2, CR3 and CR4 Fig. 3.7 shows control registers. These registers define the machine state that affects all the tasks in the systems. Control Register 0 (CRO) Control Register 0 contains system control flags that control operating mode and states of the processor. It holds the MSW (Machine Status Word). It contains six status bits : PE (Protection Enable), MP (Math Present), EM (Emulate Coprocessor), TS (Task Switched), ET (Extension Type), NE (Numeric Error), WP (Write Protect) , AM (Alignment Mask), NW (Not Write-through), CD (Cache Disable) and PG (Paging). Microprocessors and Microcontrollers Pentium Programming 31(69) 109876543210 T P| Plullp|,|r}ely) Reserved (set to 0) Ic\c|clajs|eis|vim| CRs | E|ElE|E/E|"jo} re osxtnexcer —— OSFXSR 31(63) 12H 5432 0 PP] CR3 Page - Directory Base clwI 5 (PDBR) 31(63) 0 Page - Fault Linear Address cre 31(63) 0 cRt 313029 28 19 18171615 65432140 CIN A\ INIEIT}E]M|PI eke iM) |p e[t|s|vfpje} CRO Fig. 3.7 Control Registers PE (Bit 0) Protection Enable : This bit is similar to the VM bit in EFLAGs in that it controls the pentium’s mode of operation. When PE is set, it is in protection mode otherwise it operates in Real Mode. MP (Bit 1) Math Present : When this bit is set, the pentium assumes that real floating point hardware (80287 or 80387) is present in the system. When this bit is clear, the pentium assumes that no such coprocessor exists, and will not attempt to use real floating point hardware. Microprocessors and Microcontrollers: __3-12 Pentium Programming EM (Bit 2) Emulate Coprocessoi When this bit is set, the pentium will generate an exception 11 (device not available) whenever it attempts to execute a floating point instruction. Programmer can use this exception handler to emulate floating point hardware in software. TS (Bit 3) Task Switched : The pentium sets the bit automatically every time it performs a task switch. It will never clear this bit on its own, But programmer can clear this bit using CLTS instruction. ET (Bit 4) Extension Type : When power is applied, pentium detects whether numeric processor connected is 80287 or 80387 and sets ET to logic 1, if numeric processor is 80387. This is necessary because the 80387 uses a slightly different protocol than 80287. NE (Bit 5) Numeric Error : When set enables the internal mechanism for reporting x87 FPU errors when set; enables the PC-style x87 FPU error reporting mechanism when clear. When the NE flag is clear and the IGNNE input is asserted, x87 FPU errors are ignored. When the NE flag is clear and the IGNNE input is deasserted, an unmasked x87 FPU error causes the processor to assert the FERR pin to generate an external interrupt and to stop instruction execution “immediately before executing the next waiting floating-point instruction or WAIT/FWAIT instruction. WP (Bit 16) Write Protect When set inhibits supervisor-level procedures from writing into user-level read-only pages and allows supervisor-level procedures to write into user-level read-only pages when ‘clear. AM (Bit 18) Alignment Mask : When set enables automatic alignment checking when set and disables alignment checking when clear. Alignment checking is performed only when the AM flag is set, the AC flag in the EFLAGS register is set, CPL is 3, and the processor is operating in either protected or virtual-8086 mode. Microprocessors and Microcontrollers _3-13 Pentium Programming NW (Bit 29) Not Write-Through and CD (Bit 30 ) Cache Disable : The Table 3.2 shows the interpretation of CD and NW bits within CRO. cD NW Description 1 1 Read hits access the cache, Read misses do not cause line fils Write hits update the cacho, but not external memory. Write hits cause Exclusive (E) state linos to change to Modified (M) state. Shared lines romain in the Shared (S) stato after writo hits Write misses access memory. Inquire and invalidation cycles do not effect the cache contents or state. Read hils access the cache. Read misses do nat cause line fils. Write hits update the cache. Writes ta S state Ines and write misses updata external memory Writes to S state ines change to the E state when WB/WT Inquire and invalidation cycles effect the cache contents and state. legal combination: results in General Protection (GP) fault 0 Read hits access the cache. Read misses cause line fils if CACHE and KEN are asserted. Cache _ines are initially entered in the E or S state depending on the siate of WBMWT (E = 1, S = 0). Write hits update the cache. Only writes to S siate lines and write misses access external memory. Writes to S siate lines change to E state when WBIWT = 1 Inquire and invalidation cycles effect cache contents and state. Table 3.2 Interpretation of the CD and NW bits within CRO PG (Bit 31) Paging : ‘This bit enables or disables paging mechanism in Memory Management Unit (MMU). If bit is set, paging is enabled Control Register 1 (CR1) This is reserved by Intel. Microprocessors and Microcontrollers 4 Pentium Programming Control Register 2 (CR2) Control Register 2 contains the page-fault linear address (the linear address that caused a page fault). CR2 is read-only register. The pentium, itself writes the last 32-bit linear address of page fault routine in this register. When page fault occurs, the pentium generates exception 14 (page fault). This address is important for writing page fault routine. The page fault routine helps programmer to find cause of the fault. . Control Register 3 (CR3) Control register 3 holds the physical address of the root of the two-level paging tables used when paging is enabled. It is also called Page Directory Base Register (PDBR). Only the most-significant bits (less the lower 12 bits) of the base address are specified; the lower 12 bits of the address are assumed to be 0. The page directory must thus be aligned to a page (4-KByte) boundary. The PCD and PWT flags control caching of the page directory in the processor’s internal data caches (they do not control TLB caching of page-directory information). PCD (Bit 4) Page-level Cache : It controls caching of the current page directory. When the PCD flag is set, caching of the page-directory is prevented; when the flag is clear, the page-directory can be cached. This flag affects only the processor's internal caches (both L1 and L2, when present). The processor ignores this flag if paging is not used (the PG flag in register CRO is clear) or the CD (cache disable) flag in CRO is set. PWT (Bit 3) Page-level Writes Transparent : . It controls the write-through or writeback caching policy of the current page directory. When the PWT flag is set, writethrough caching is enabled; when the flag is clear, write-back caching is enabled. This flag affects only internal caches (both L1 and L2, when present). The processor ignores this flag if paging is not used (the PG flag in register CRO is clear) or the CD (cache disable) flag in CRO is set Control Register 4 (CR4) Control Register 4 contains a group of flags that enable several architectural extensions, and indicate operating system or executive support for specific processor capabilities. The control registers can be read and loaded (or modified) usirig the move-to-or-from-control-registers forms of the MOV instruction. In protected mode, the MOV instructions allow the control registers to be read or loaded (at privilege level 0 only). This restriction means that application programs or operating-system procedures (running at privilege levels 1, 2, or 3) are prevented from reading or loading the control registers. roprocessors and Microcontrollers 315 Pentium Programming ‘VME (Bit 0) Virtual-8086 Mode Extensions : When set it enables interrupt. and exceptionhandling extensions in virtual-8086 mode and disables the extensions when clear. Use of the virtual mode extensions can improve the performance of virtual-8086 applications by eliminating the overhead of calling the virtual-8086 monitor to handle interrupts and exceptions that occur while executing an 8086 program and, instead, redirecting the interrupts and exceptions back to the 8086 program's handlers. It also provides hardware support for a virtual interrupt flag (VIF) to improve reliability of running 8086 programs in multitasking and multiple-processor environments. PVI (Bit 1) Protected-Mode Virtual Interrupts : When set it enables hardware support for a virtual interrupt flag (VIF) in protected mode and disables the VIF flag in protected mode when clear. TSD (Bit 2) Time Stamp Disable : When set it restricts the execution of the RDTSC instruction to procedures running at privilege level 0 and allows RDTSC instruction to be executed at any privilege level when clear. DE (Bit 3) Debugging Extensions : When set it references to debug registers DR4 and DRS cause an undefined opcode exception to be generated and when clear, processor aliases references to registers DR4 and DR5 for compatibility with software written to run on earlier processors from Intel 32-bit family. PSE (Bit 4) Page Size Extensions : When set it enables 4-MByte pages and restricts pages to 4 KBytes when clear. PAE (Bit 5) Physical Address Extension : When set, enables paging mechanism to reference greater-or-equal-than-36-bit physical addresses. When clear, restricts physical addresses to 32 bits. MCE (Bit 6) Machine-Check Enable : When set enables the machine-check exception and disables the machine-check exception when clear. PGE (Bit 7 ) Page Global Enable : (Introduced in the P6 family processors.) When set enables the global page feature and disables the global page feature when clear. The global page feature allows frequently used or shared pages to be marked as global to all users (done with the global flag, bit 8, in a page-directory or page-table entry). Global pages are not flushed from the translation-lookaside buffer (TLB) on a task switch or a write to register CR3. Microprocessors and Microcontrollers 3-16 Pentium Programming 3.2.7.2 Debugs Registers Debug registers allow pentium to provide debugging feature. The DR, to DR, registers are used to control debug feature. The debug registers DRO to DR3 contain addresses associated with one of four breakpoints defined by certain bits in debug register 7 (DR,) Fig. 38 shows debug registers. The software debugger can load breakpoint addresses in these registers to aid in debugging. 0 9128 27 28 2824 29232420 1848 97 a Ie hatsszt Oe 876543210 LEN tN LEN Lsiguict.toi E|313| 212/41] 1/010} 4615141312111098 765432 10 31 0 31 0 Broakpoint 2 Linear Address: 1 0 Breakpoint 1 Linear Address 31 0 Breakpoint 0 Linear Address (Gy Reserved Fig. 3.8 Debug registers DRT DRE | ors DR4 DRS DR2 DRI DRO icroprocessors and Microcontrollers 7 Pentium Programming These registers can be written to and read using the move to or from debug register form of the MOV instruction. A debug register may be the source or destination operand for one of these instructions. The debug registers are privileged resources; a MOV instruction that accesses these registers can only be executed in real-address mode, in SMM, or in protected mode at a CPL of 0. An attempt to read or write the debug registers from any other privilege level generates a general protection exception. Debug Registers 0 through 3 The first four debug registers (DR) - DR;) hold four linear addresses for breakpoints. The addresses in these registers are compared with address of the each instruction at the time of instruction execution and if a match is found, an exception 1 (debug fault) is generated. This allows pentium to monitor upto four different addresses in the-system. For each breakpoint, the following information can be specified and detected with the debug registers: . = The linear address where the breakpoint is to occur. = The length of the breakpoint location (1, 2, or 4 bytes). = The operation that must be performed at the address for a debug exception to be generated. = Whether the breakpoint is enabled. = Whether the breakpoint condition was present when the debug” exception was generated. Debug Registers 4 and 5 Registers 4 and 5 are undefined. Debug Register 6 Debug register 6 is also called debug status register. This register is updated only when an exception is generated. The pentium sets the appropriate bits in this register which gives information of the probable causes for the last debug fault (Exception 1). The pentium never clears these bits. Programmer must clear these status bits by writing into DR6. The status bits are : BO - B3 Breakpoint Condition Detected : When set, bit indicates that its associated breakpoint condition was met when a debug exception was generated. These flags are set if the condition described for each breakpoint by the LENn, and R/Wn flags in debug control register DR7 is true. They are set even if the breakpoint is not enabled by the Ln and Gn flags in register DR7. BD (Bit 13) Break For Debug Register Access : The access for the debug registers can be locked by setting GD bit in DR; The BD bit, if set, allows to invoke exception 1 handler, if processor tries to access debug register eventhough the accessed is locked. BS (Bit 14) Break For Single Step : This bit is set if the pentium has invoked exception 1 since trace bit is set (TF bit is set in EFLAGs) BT (Bit 15) Break for task switch : When set this flag indicates that the debug exception resulted from a task switch where the T flag (debug trap flag) in the TSS of the target task was set. Debug Register 7 It controls the debug feature. By programming bits in this register, programmer can configure the debug operation of the four linear address breakpoints. Each breakpoint is controlled by a set of four fields. These are : LO - L3 (Bit 0, 2, 4, and 6) Local Enable : When this bit is set, the breakpoint address in DRo is monitored as long as pentium is executing current task. When a task switch occurs, this bit is cleared by the pentium and it must be re-enabled by writing into DRy required. G0 - G3 (Bit 1, 3, 5, and 7) Global Enable : When this bit is set, the breakpoint address in DRo is monitored all times, regardless of task. This bit must be cleared by writing into DR. RWO - RWS (Bit 16, 17, 20, 21, 24, 25, 28, and 29) Read/Write Access : These bits decides the type of access that must occur at the address in DRp. Table 33 gives the list of different access types. RW RW bits in register DR; 00 Code fetch o1 Data write 10 Reserved "1 Data Read or write Table 3:3 RW bits Microprocessors and Microcontrollers 319 ‘ium Programming LENO - LEN3 Length Fields (Bits 18, 19, 22, 23, 26, 27, 30, and 31) : The breakpoints are further distinguished by its size. The Table 3.4 shows the different sizes of the breakpoints. LEN LEN bits in register DR, 00 1 byte 04 2 bytes, word aligned 10 Reserved 4 4 bytes, dword aligned Table 3.4 LEN bits LE (Local Exact) : The pipelined architecture of pentium fetches, decodes next instruction before the current one completes. Due to this, pentium may not set status bit in DR, at the instant breakpoint occurs. If you set local exact bit, pentium sets, corresponding status bit at the same instant at which breakpoint occurs, when pentium is running the current task. When a task switch occurs this bit is cleared. This bit applies to all four linear breakpoints. GE (Global Exact) : This is similar to the LE bit. If this bit is set pentium informs about breakpoint at the instant it occurs regardless of task. GD (Global debug access) : When this bit is set, the pentium denies the further access to any of the debug registers, either for reading or writing. 3.2.7.3 Test Registers | Among the eight test registers (TRy-TR,), only two test registers (TR,-TR,) are currently defined. The Fig. 3.9 shows the bit pattern of test registers. These registers are used to check translation lookaside buffer (TLB) of the paging unit. Linear address Physical address Fig. 3.9 Test Registers Test Register 6 This is the TLB testing command registers. By writing into this register, it is possible to either initiate a write directly into the pentium’s TLB or to perform TLB lookups. TR, is divided into fields as follows : Microprocessors and Microcontrollers __3-20 Pentium Programming c : This is.a commend bit. When this bit is cleared, a write to the TLB is performed. If it is set, the processor performs a TLB lookup. The next 7 bits are used as tag attributes for the TLB cache, either when writing a new entry or when performing a TLB lookup. W (bit 5) Not writable W (bit 6) Writable U (bit 7) : ‘Not user U (bit 8) : User D bit 9) : ‘Not dirty D (bit 10) : Dirty V (bit 11) : Valid cae ea Test Register 7 This register is the data testing register of the TLB. When a program is performing writes, the entry to be stored is contained in this register, along with cache set information. TR, is divided into fields as follows RP : This is replacement pointer. This field indicates which set of the TLB's four-way set associative cache to write to H : ‘This is peinter location. If this bit Is set, the RP field determines which cache set to write to. If it is cleared, the set is determined with an internal algorithm. Physical address, This is the data field of the TLB. This field contains either the (bits 12-31) . physical address to be written into the TLB or the result of a valid TLB hit 3.3 Pentium Addressing Modes When processor executes an instruction, it performs the specified function on data, which is referred to as operands. The operand may be the part of instruction, may reside in one of the internal registers of the processor, may be stored in memory, or may be held at an I/O port. As a part of programming flexibility, processor provides different ways to access these operands from different locations. The different ways by which processor can access data are referred to addressing modes. The Pentium provides a total of 11 addressing modes for instuctions to specify operands. These addressing modes can be categorized in three groups : «Register Operand addressing = Immediate Operand addressing = Memory Operand addressing Microprocessors and Microcontrollers __3-21 Pentium Programming The memory operand addressing modes are further classified as shown in Fig. 3.10. Pentium addressing modes: ee Register immediate Memory operand operand operand addressing addressing addressing | Direct Register Based Index Scaled Based Based Based —_Based scaled indirect index index - scaled —_ index index index with with displacement displacement Fig. 3.10 Pentium addressing modes Register Operand Addressing Mode In the register addressing mode, the operand is located in one of the 8, 16 or 32-bit general purpose registers of Pentium. Table 3.5 shows the ist of internal general purpose registers that can be used as a source or destination operand Register Operand size Bytes (Reg 8) Word (Reg 18) | Double word (Reg 32 ) ‘Accumulator AL. AH ax Base aL. BH 8x Count cL.cH ox Data DL. DH ox Stack pointer - sP Base pointer - BP Source index - si Destination index - o Code segment - cs Data segment - bs ‘Stack segment - ss E data segment - ES F data segment - Fs G data segment - os Table 3.5 Direct addressing registers and their sizes Microprocessors and Microcontrol 3-22 Pentium Pi ming Examples : For 8-bit operand : MOV AL, DL This instruction copies the lower byte contents of the EDX register to the lower byte of the EAX register. Both source and destination operands are the internal registers of Pentium. Before Execution After Execution For 16-bit operand : MOV AX, DX This instruction copies the lower word contents of EDX register to the lower word of the EAX register. Before Exécution After Execution 0 M1 oO For 32-bit operand : MOV EAX, EDX This instruction copies the contents of EDX register to the EAX register. Before Execution ‘After Execution ex [zsea[ ee] Immediate Operand Addressing Mode In the immediate operand addressing mode, the operand is a part of the instruction, as shown in the Fig. 3.11. The operand can be 8-bit, 16-bit or 32-bit. Microprocessors and Microcontrollers ___3-23 Pentium Programming Opcode | Immeciate operand ns Instruction Fig. 3.11 Instruction encoded with an immediate operand Example : For 8-bit operand : MOV AL, 20H This instruction copies 20H in the lower byte of EAX register. Before Execution After Execution Ki 0 3 0 ox [or [ex] For 16-bit operand : MOV AX, 1020 H This instruction copies 1020H in the lower word of EAX register Before Execution After Execution 31 oO 31 oO For 32-bit operand : MOV EAX, 10B89C20H This instruction copies 10B89C20H in the EAX register. Before Execution After Execution 3 oO 3 0 Memory Operand Addressing Modes The remaining 9 addressing modes provide a mechanism for specifying the physical address of an operand. In Pentium, physical address is calculated before any read or write operation. The physical address consisis of two components : The segment base address and an affective address. The effective address can be specified in a variety of ways. One way is to encode the effective address of the operand directly in the instruction. This represents direct addressing mode. The effective address can be generated with the combinations of four addressing elements : Base, Index, Scale factor and displacement. Microprocessors and Microcontrollers 3-24 Pentium Programming where Base : The contents of any general purpose register. Index : The contents of any general purpose register. The index registers are used to access the elements of an array, or a string of characters. Scale : The index register’s value can be multiplied by a scale factor either 1, 2, 4, or 8. Scaled index mode is especially useful for accessing arrays or structures. Displacement : An 8,.16 or 32-bit immediate value following the instruction. The general formula for generating effective address is given as follows : EA = base + (index x scaling factor) + displacement The Fig. 3.12 shows the registers that can be used to hold the values of segment base, base, and index. cs AK AX 1 ss Bx BX bs ox cx 2 8,18 0" PAT\ eg 7 ' { sp? *|{ DX DX *4 Displace- BP BP 4 ment FS SI si cs Dl DI 8 Fig. 3.12 Physical address generation Physical Address Segment Base Address + Effectice Address (PA) SBA + EA PA = SBA: {Base + (Index x Scale factor) + Displacement } Now we see the different memory operand addressing modes : Direct Mode : In this mode, the instruction is having the effective address of the operand. This effective address is used as an 8, 16 or 32 displacement from the location specified by the current value in the selected segement register is always DS. Example : MOV EBX, 159D H Here, PA=DS + 159D H Register Indirect Mode : In this mode, the base register gives the effective address of the operand. Example : MOV EBX, [EAX] Here, PA=DS + EAX Based Mode : In this mode, a base register’s contents are added to a displacement to form the effective address of the operand. Microprocessors and Microcontrollers __3-25 Pentium Programming Example : MOV EBX, [ EAX + 24] HerePA=DS + EAX + 24 Index Mode : In this mode, an index register’s contents are added to a displcement to form the effective address of the operand. Example : MOV EBX, 159D; + [ SI] Here, PA=DS + 159Dy + SI Scaled Index Mode : In this mode, an index register’s contents are multiplied by a scaling factor and then added to displacement to form the effective address of the operand. Example : MOV EBX, 159Dy + [ SI * 4] Here, PA=DS + 159Dy + (SI * 4) Based Index Mode : In this mode, the contents of a base register are added to the contents of an index register to form the effective address of the operand. Example : MOV EBX, [ ESI ][ EAX ] Here, PA=DS + ESI + EAX Based Scaled Index Mode : In this mode, the contents of an index register are multiplied by a scaling factor and then added to the base register to obtain the effective address of the operand. - Example : MOV EBX, [ ESI * 2 ] [ EAX] Here, PA=DS + ( ESI x 4) + EAX Based Index Mode with Displacement : In this mode, the contents of an index register and the base register and a displacement are all added together to form the effective address of the operand. Example : MOV EBX, [ EAX ] [ EDI + 24] HerePA=DS + EAX + EDI + 24 Based Scaled Index Mode With Displacement : In this mode, the contents of an index register are multiplied by a scaling factor and result is then added to the contents of a base register and displacement to form the effective address of the operand. Example : MOV EBX, [ EAX] [ ESI” 4] + 24 Here, PA=DS + EAX + ( ESI x 4) + 24 Microprocessors and Microcontrollers Pentium Programming 3.4 Pentium Data Types The Pentium can handle with data types of 8 (byte), 16 (word), 32 (doubleword), and 64 (quadword) bits in length. The table 3.6 lists the data types supported by Pentium processor. 15. ) 31 0 63 o 15, ° 31 0 63 0 31_ 30 22 o Ss Ep Mantissa 63 62 51 0 Ss Exp Mantissa 7978 6462 0 S_ Exponent Mantissa Fig. 3.13 Pentium numeric data formats Byte unsigned integer Word unsigned integer Double word unsigned integer Quad word unsigned integer Byte signed integer (2's complement form) Word signed integer (2s complement form) Double word signed integer (2s complement form) Quad word signed integer (2s complement form) Single precision floating point Double precision floating point Double extended precision floating point Mi rocessors and Microcontrollers 3-27 Pentium Program | Data Type Description General Bit (byte), 16-bit (word), 32-bit (double word), and} 64-bit (quadword) locations. Integer A signed binary value represented in 2's complement form. It can be byte, word or doubleword in length. Ordinal ‘An unsigned integer. It can be byte, word, or doublo| word in length. Unpacked BCD (Binary Coded Decimal) One BCD Packed BCD ‘Two BCD digits in one byte. Near Pointer ‘A 32-bit effective address that represents the offset| within a segment. Used for references within a. (0 - 9) in one byte. segmented memory. Bit field ‘Any bit position in the sequence of bits. Byte string A contiguous sequence of bytes. Floating point IEEE standard formats. Table 3.6 Illustrates data types 3.5 Instruction Set Summary 3.5.1 Date Transfer Instructions The data transfer instructions move data between memory and the general-purpose and segment registers. They also perform specific operations such as conditional moves, ‘stack access, and data conversion. MOV: Move data between general-purpose registers; move data between memory and general-purpose or segment registers; move immediates to general-purpose registers XCHG : Exchange BSWAP : Byte swap XADD : Exchange and add CMPXCHG : Compare and exchange CMPXCHGSB : Compare and exchange 8 bytes PUSH : Push onto stack POP: Pop off of stack PUSHAJ/PUSHAD : Push general-purpose registers onto stack POPA/POPAD : Pop general-purpose registers from stack CWDICDQE : Convert word to doubleword/Convert doubleword to quadword CBWICWDE : Convert byte to word/Convert word to doubleword in EAX register MOVSX: Move and sign extend MOVZX : Move and zero extend Microprocessors and Microcontrollers 28 Pentium Programming 3.5.2 Binary Arithmetic Instructions The binary arithmetic instructions perform basic binary integer computations on byte, word, and doubleword integers located in memory and/or the general purpose registers. ADD : Integer add ADC : Add with carry SUB : Subtract SBB : Subtract with borrow IMUL : Signed multiply MUL : Unsigned multiply IDIV : Signed divide DIV : Unsigned divide INC : Increment DEC : Decrement NEG : Negate MP : Compare 3.5.3 Decimal Arithmetic Instructions The decimal arithmetic instructions perform decimal arithmetic ‘on binary coded decimal (BCD) data. DAA : Decimal adjust after addition DAS : Decimal adjust after subtraction AAA: ASCII adjust after addition AAS : ASCII adjust after subtraction AAM : ASCII adjust after multiplication AAD : ASCII adjust before division 3.5.4 Logical Instructions The logical instructions perform basic AND, OR, XOR, and NOT logical operations on byte, word, and doubleword values. AND : Perform bitwise logical AND OR : Perform bitwise logical OR XOR : Perform bitwise logical exclusive OR NOT : Perform bitwise logical NOT Microprocesso! id Microcontrollers 3-29 Pentium Programming 3.5.5 Shift and Rotate Instructions The shift and rotate instructions shift and rotate the bits in word and doubleword operands. SAR : Shift arithmetic right SHR : Shift logical right SALISHL : Shift arithmetic left/Shift logical left SHRD : Shift right double SHLD : Shift left double ROR: Rotate right ROL: Rotate left RCR : Rotate through carry right RCL: Rotate through carry left 3.5.6 Bit and Byte Instructions Bit instructions test and modify individual bits in word and doubleword operands. Byte instructions set the value of a byte operand to indicate the status of flags in the EFLAGS register. BT: Bit test BTS : Bit test and set BTR : Bit test and reset BTC: Bit test and complement BSF : Bit scan forward BSR : Bit scan reverse SETE/SETZ : Set byte if equal/Set byte if zero SETNE/SETNZ : Set byte if not equal /Set byte if not zero SETA/SEINBE : Set byte if above/Set byte if not below or equal SETAE/SETNB/SETNC : Set byte if above or equal /Set byte if not below/Set byte if not carry SETB/SETNAE/SETC : Set byte if below/Sct byte if not above or equal/Set byte if carry SETBE/SETNA : Set byte if below or equal/Set byte if not above SETG/SETNLE : Set byte if greater /Set byte if not less or equal SETGE/SETNI SETLISETNGE : Set byte if less/Set byte if not greater or equal SETLE/SEING : Set byte if less or equal/Set byte if not greater SETS : Set byte if sign (negative) SETNS : Set byte if not sign (non-negative) SETO : Set byte if overflow Set byte if greater or equal /Set byte if not less Microprocessors and Microcontrollers __3-30 Pentium Programming SETNO : Set byte if not overflow SETPE/SETP : Set byte if parity even/Set byte if parity SETPO/SETNP : Set byte if parity odd/Set byte if not parity TEST : Logical compare 3.5.7 Control Transfer Instructions The control transfer instructions provide jump, conditional jump, loop, and call and return operations to control program flow. JMP : Jump JEJZ : Jump if equal/Jump if zero JNE/JNZ : Jump if not equal/Jump if not zero JAJNBE : Jump if above/Jump if not below or equal JAE/JNB : Jump if above or equal/Jump if not below JBIJNAE : Jump if below/Jump if not above or equal JBE/JNA : Jump if below or equal/Jump if not above JGNLE : Jump i greater/Jump if not less or equal JGE/JNL : Jump if greater or equal/Jump if not less JUJNGE : Jump if less/Jump if not greater or equal JLE/JNG : Jump if less or equal/Jump if not greater JC: Jump if carry JNC : Jump if not carry JO: Jump if overflow JNO : Jump if not overflow JS: Jump if sign (negative) JNS : Jump if not sign (non-negative) JPO/JNP : Jump if parity odd/Jump if not parity JPE/P : Jump if parity even/Jump if parity JCXZ/JECXZ : Jump register CX zero/Jump register ECX zero LOOP: Loop with ECX counter LOOPZ/LOOPE : Loop with ECX.and zero/Loop with ECX and equal LOOPNZ/LOOPNE : Loop with ECX and not zero/Loop with ECX and not equal CALL : Call procedure RET : Return IRET : Return from interrupt INT : Software interrupt INTO : Interrupt on overflow BOUND : Detect value out of range Microprocessors and Microcontrollers 3-31 Pentium Programming 3.5.8 String Instructions The string instructions operate on strings of bytes, allowing them to be moved to and from memory. MOVS/MOVSB : Move string/Move byte string MOVS/MOVSW : Move string/Move word string MOVS/MOVSD : Move string/Move doubleword string CMPS/CMPSB : Compare string/Compare byte string CMPS/CMPSW : Compare string/Compare word string CMPS/CMPSD : Compare string/Compare doubleword string SCAS/SCASB : Scan string/Scan byte string SCAS/SCASW : Scan string/Scan word string SCAS/SCASD : Scan string/Scan doubleword string LODS/LODSS : Load string/Load byte string LODS/LODSW : Load string/Load word string LODS/LODSD : Load string/Load doubleword string STOS/STOSB : Store string/Store byte string STOS/STOSW : Store string/Store word string STOS/STOSD : Store string/Store doubleword string REP : Repeat while ECX not zero REPE/REPZ : Repeat while equal/Repeat while zero REPNE/REPNZ : Repeat while not equal/Repeat while not zero 3.5.9 1/0 Instructions These instructions move data between the processor’s I/O ports and a register or memory. IN: Read from a port OUT : Write to a port INS/INSB : Input string from port/Input byte string from port INS/INSW : Input string from port/Input word string from port INS/INSD : Input string from port/Input doubleword string from port OUTS/OUTSB : Output string to port/Output byte string to port OUTS/OUTSW : Output string to port/Output word string to port OUTS/OUTSD : Output string to port/Output doubleword string to port 3.5.10 Enter and Leave Instructions These’ instructions provide machine-language support for procedure calls in block-structured languages. Microprocessors and Microcontrollers___3-32 Pentium Programming ENTER : High-level procedure entry LEAVE : High-level procedure exit 3.5.11 Flag Control (EFLAG) Instructions The flag control instructions operate on the flags in the EFLAGS register. STC : Set carry flag CLC: Clear the carry flag CMC: Complement the carry flag CLD : Clear the direction flag STD : Set direction flag LAHF : Load flags into AH register SAHF : Store AH register into flags PUSHF/PUSHFD : Push EFLAGS onto stack POPF/POFFD : Pop EFLAGS from stack STI: Set interrupt flag CLI: Clear the interrupt flag 3.5.12 Segment Register Instructions The segment register instructions allow far pointers (segment addresses) to be loaded into the segment registers. LDS : Load far pointer using DS LES : Load far pointer using ES LFS : Load far pointer using FS LGS : Load far pointer using GS LSS : Load far pointer using SS 3.5.13 Miscellaneous Instructions The miscellaneous instructions provide such functions as loading an effective address, executing a “no-operation,” and retrieving processor identification information. LEA : Load effective address NOP : No operation XLAT/XLATB : Table lookup translation CPUID : Processor identification Protected Mode 1 Introduction In this chapter we will see the protected mode features of Pentium processor. The complete capabilities of the Pentium processor are unlocked when the Pentium processor operates in Protected Mode. After reset Pentium processor enters into real mode but setting bit 0 in CRO register it is possible to operate Pentium processor in Protected Mode. Features of Protected Mode : 1. Protected Mode vastly increases the linear address space to four gigabytes (28 bytes) and allows the running of virtual memory programs of almost unlimited size (64 terabytes or 2 bytes ). 2. Protected Mode allows the Pentium processor to run all of the existing 8086 and 80286 programs. 3. It provides a sophisticated memory management and a hardware-assisted protection mechanism. 4, It provides special Pentium processor instructions for multitasking operating systems. 5. It supports paging mechanism. 4.2 Protected Mode-Support Registers Fig. 4.1 shows the protected mode register set of the Pentium processor. It is a superset of the real mode register set. It has addition registers. These are : 1, Global Descriptor Table Register (GDTR) - 48 bits : It holds the 32-bit linear base address and 16-bit limit of the Global descriptor Table (GDT). 2. Interrupt Descriptor Table Register (IDTR) - 48 bits : It holds the 32-bit linear base address and 16-bit limit of the Interrupt Descriptor Table (IDT). 3. Local Descriptor Table Register (LDTR)- 16 bits : It holds the 16-bit selector for the Local Descriptor Table Descriptor. 4. Task Register (TR)- 16 bits : It holds the 16 bit selector for the Task State Segment Descriptor. (4-1) Microprocessors and Microcontrollers 42 Protected Mode Pentien x praetor 0 er os ss es os x» 1587 0 ex mf a | ex cx cpa] ex ox af a | ox esp ra cap ee Es, s! » ° erscs corr [acimw Sasa] la ore [owe Bendheovn] “Lei ore [seo Ry sw cr, oR ory * ° Re br, DR, DR, caf Rs OR oR, Te 1 Fig. 4.1 Protected mode register model of Pentium processor Microprocessors and Microcontrollers 4-3 Protected Mode In protected mode register set, function of few registers have been extended. 1. The instruction pointer is now 32 bit. It is called as EIP. 2. More bits of the flag registers (EFLAGs) are active. 3. All five control registers CR,-CR, are active. The following sections describe the functions of these registers in detail. 4.3 Logical to Physical Address Translation The Pentium processor has three distinct address spaces : Logical, linear and physical. A logical address (also known as virtual address) consists of a selector and an offset. A selector is the contents of a segment register 15 0 34 9 Descriptor table Linear base address Descriptor ‘Segment translation 's paging enabled, Yes Page translation 34 ° Physical address, Fig. 4.2 Address translation overview Microprocessors and Microcontrollers 44 Protected Mode We know that, in real mode, the segmentation unit shifts the selector left four bits and adds the result to the offset to form the linear address. While in protected mode every segment selector has a linear base address associated with it, and it is stored in the segment descriptor. A selector is used to point a descriptor for the segment in a table of descriptors. The linear base address form the descriptor is then added to the 32-bit offset to generate the 32-bit linear address. This process is known as segmentation or segment translation. If paging unit is not enabled then the 32-bit linear address corresponds to the physical address. But if paging unit is enabled, paging mechanism translates the linear address space into the physical address space by paging translation. This is illustrated in Fig. 4.2. The following sections describe the segment translation and page translation mechanism in detail. 4.4 Segmentation Segmentation or segment translation is a process of converting logical address into a linear address. Fig. 4.3 shows the segment translation mechanism. It shows how selector is used to access a descriptor in a descriptor table. The 13-bit index part of selector is multiplied by 8 and used as a pointer to the desired descriptor in a descriptor table. The index value is multiplied by 8 because each descriptor requires 8 bytes in the descriptor Logicat address 18 0 31 0 Fig. 4.3. Segment translation mechanism Microprocessors and Microcontrollers 45 Protected Mode table. The descriptor in the descriptor table contains mainly base address, segment limit and access right byte. The Pentium processor adds the base address from the descriptor to the effective address or offset to generate a linear address. As shown in the Fig. 4.3, the selector component of each logical address contains 2 bits which represent the privilege level of the program section requesting access to a segment. The descriptor of each segment contains 2 bits which represent the privilege level of that segment. When an executing program attempts to access a segment, the memory management unit compares the privilege level in the selector with the privilege level in the descriptor. If the segment selector has the same or greater privilege level, then the memory management unit allows the segment to be accessed. If the selector privilege level is lower than the privilege level of the segment, the memory management unit denies the access and sends an interrupt signal to the CPU indicating a privilege level violation. There are two major categories of descriptor table in a Pentium processor system : Global and Local. The Global Descriptor Table (GDT) is a general purpose table of descriptors, can be used by all programs to reference segments of memory. Whereas a Local Descriptor Table (LDT) are set up in the system for individual task or closely related group of tasks. The table indicator (TI) bit in the selector decides which descriptor table should be referred by the selector. When TI bit is 0, the index portion of the selector refers to a descriptor in the GDT. When TI bit is 1, it refers to descriptor in the current LDT. This is illustrated in Fig. 4.4 * Local Descriptor Table Fig. 4.4 Selector and descriptor tables Microprocessors and Microcontrollers 4-6 Protected Mode Fig. 44 shows that the first entry in the GDT is reserved by the processor and should be all zeros. This is know as the NULL descriptor. The processor does not cause an exception when a segment register (other than CS or $S) is loaded with a null selector. However, it will cause an exception when the segment register is used to access memory. This feature is useful for initialising unused segment registers so as to trap accidental references. Pentium processor has six segment registers = One for current code segment (CS) = One for current stack segment (SS) = Four for general data segments (DS, ES, FS, GS) Segment registers (selectors) select segment descriptors : = Thirteen bits select descriptor = One bit selects descriptor table = Two bits aid privilege checking 4.5 Segment Descriptors and Memory Management through Segmentation In protected mode, memory management unit (MMU) ‘uses the segment selector to access a descriptor for the desired segment in a table of descriptors in memory. Segment descriptor is a special structure which describes the segment. Exactly one segment descriptor must be defined for each segment of the memory. Descriptors are eight type quantities which contain attributes about a given region of linear address space (i.e. a segment). These attributes include the 32-bit base linear address of the segment, the 20-bit length and granularity of the segment, the protection level, read, write or execute privileges, the default size of the operands (16-bit or 32-bit), and the type of segment. Fig. 4.5 shows the general format of a descriptor. As shown in Fig. 45, segment descriptor has following fields. Base : It contains the 32-bit base address for a segment. Thus defines the location of the segment within the 4 gigabyte linear address space. The Pentium processor concatenates the three fragments of the base address to form a single 32-bit address. Limit ; It defines the size of the segment. The Pentium processor concatenates the two fragments of the limit field to form a 20 bit value. The Pentium processor interprets this 20-bit value in two ways, depending on the setting of the granularity bit (G) : If G bit 0 : In units of one byte, to define a limit of up to 1 M byte (2%) If G bit 1: In units of 4 kilobytes, to define a limit of up to 4 gigabytes. Granularity Bit : It specifies the units with which the limit field is interpreted. When bit is 0, the limit is interpreted in units of one byte; otherwise limit is interpreted in units of 4 Kbytes. sroprocessors and Microcontrollers 47 Protected Mode 1 0 Bytes SEGMENT BASE 15 SEGMENT LIMIT 15, 4 G]D} 0} AVL] LIMIT +4 19... 16 8 Access Rights Bytes BASE Base Address of the segment LIMIT The length of the segment Pp Present Bit: 1 = Present 0 = Not present DPL Descriptor privilege Level 0 - 3 s ‘Segment Descriptor : 0 = System Descriptor 1 = Code or Data Segment Descriptor TYPE Typo of segment A ‘Accessed Bit s Granutarity Bit : 1 = Segment length is page granular 0 = Segment length is byte granular D Default Operation Size (recognised in code segment descriptors only) 1= 32-bit segment = 16-bit segment o Bit must be zero (0) for compatibility with future processors AVL Available field for user or OS In a maximum - size segment (i.e. a segment with G = 1 and segment limit 19 ..... 0 = FFFFFH), the lowest 12 bits of the segment base should be zero. (i. segment base 11 .... 000 = 000H). Fig. 4.5 General Segment Descriptor Format D (Default size) : When this bit is cleared, operands contained within this segment are assured to be 16 bits in size. When it is set, operands are assumed to be 32-bits. 0 (Reserved by Intel) : It neither can be defined nor can be used by user. This bit must be zero for compatibility with future- processors. AVLIU (User Bit) : This bit is completely undefined, and Pentium processor ignores it. This is available field /bit for user or operating system. Access Rights Byte P (Present Bit) : The present P bit is 1 if the segment is loaded in the physical memory, if P = 0 then any attempt to access this segment causes a not present exception (exception 11). DPL (Descriptor Privilege Level) : It is a 2-bit field defines the level of privilege associated with the memory space that the descriptor defines- DPLy is the most privileged whereas DPL, is the least privileged. S (System Bit) : The segment $ bit in the segment descriptor determines if a given segment is a system segment or a code or a data segment. If the S bit is 1 then the segment is either a code or data segment, if it is 0 then the segment is system segment. Type : This specifies the specific descriptors among various kinds of descriptors. (Detail explanation is given in the following sections). A (Accessed Bit) : The Pentium processor automatically sets this bit when a selector for the descriptor is loaded into a segment register. This means that Pentium processor sets accessed bit whenever a memory reference is made by accessing the segment. Microprocessors and Microcontrollers __ 4-8 Protected Mode A Segment Descriptor = Describes a segment. = Must be created for every segment. = Is created by the programmer. = Determines a base address of the segment. = Determines a size of the segment. = Determines a type of the segment. = Determines a privilege level of the segment. Segment Descriptor Defines = Base address (32-bits) = Segment limit (20 bits) = Type of segment (4 bits) «= Privilege level of segment (2 bits) = Whether segment is physically present (1 bit) = Whether segment has been accessed before (1 bit) = Granularity of limit field (1 bit) = Size of operands within segment (1 bit) = Intel reserved bit (1 bit) = AVL bit (1 bit) = Default size (1 bit) 4.5.1 Types of Segment Descriptors Fig. 46 shows the types of segment descriptors. As shown in the Fig. 4.6, there are two main categories of segments. System segments and Non-system segment. These two basic types are further categorised into five types. System Non - System LOT TSS Gate Code Data Fig. 4.6 Types of segment descriptors Microprocessors and Microcontrollers 4-9 Protected Mode 4.5.1.1 Non-system Segment Descriptor The code and data segment descriptors are the non-system segment descriptors. Fig. 47 shows the general format for code and data segment descriptor and Table 4.1 illustrate how the specific bits in the access right byte are interpreted in data and code segment descriptors. SEGMENT BASE 15......0 SEGMENT LIMIT 15.....0 BASE umiT | ACCESS RIGHTS 16 BYTE 0 = Default Instruction Attributes are 16 - its AVL Available field for user or OS sc Grenularity Bit 1 = Segment length is page granular 0 = Segment length is byte granular 0 Bit must be zero (0) for compatibility with future processors Note : In amaximum - size segment (i.e. a segment with G = 1 and segment limit 19 . the lowest 12 bits of the segment base should be zero. (i.e. segment base 11 FEFFFH), (000 = 00H). Fig. 4.7 General format for code/data segment descriptor Bit Name Function Position 7 Present (P) P Segment is mapped into physical memory. P-0 No mapping to physical memory exists, base and limit are not used. 65 | Descriptor Privilege| ‘Segment privilege attribute used in privilege tests. Level (OPL) 4 Segment Descriptor] S = Code or Data (includes stacks) segment desctiptor 4 §) S=0 System segment descriptor or Gate descriptor 3 Executable (E) Descriptor type is data segment ; 2 Expansion Direction! ED = 0 Expand up s offsets must be < limi ©) ED=1 Expand down segment, offsets must be > limit, 1 Writeabie (W) w=o Data segment may not be written into. wet Data segment may be written into. Note : If data segment (S = 1. E = 0) Microprocessors and Microcontrollers __4-10 Protected Mode [3 Executable (E) Descriptor type is code segment: 2 Conforming (C) c=1 Code segment may only be executed when CPL > DPL and CPL remains unchanged. 1 Readable (R) R=0 Code segment may not be read. R= Code segment may be read. Note : If code segment (S = 1, € = 1) 0 ‘Accessed (A) A= Segment has not been accessed. AS Segment selector has been taded inlo segment] register or used by selector test instructions. Table 4.1 Access rights for segments The Executable (E) bit indicates whether segment is code or data segment. If E bit is 1, segment is code segment otherwise segment is data segment. The code segment may be executable, or executable and read. This is determined by Readable (R) bit. If R bit is 1, code segment is executable and readable otherwise it is only executable. If conforming bit (©) is 1, code segment can be executed and shared by programs at different privilege levels. In case of stack segment, segment starts at the base linear address plus the maximum segment limit, whereas data segment start at the base linear address and expand to the base linear address plus limit as shown in Fig. 48: (Base linear Max Limit address) SFFFFFFF 4 SFFFFFFFY, Data Limit Limit t | segment FFFF 4 | FFFF 4 Max Limit 3FFFOOU0,, 3FFFO000, Max Limit (Base inear . address) Fig. 4.8 Expansion direction for data and stack segment The expansion direction (ED) bit specifies expansion direction for the segment. If ED = 0, expansion direction is upward which is data segment and if ED = 1, expansion direction is downwards which is stack segment The write (W) bit for data segment indicates whether the data segment is read only, c~ read and write. If W bit 0, data segment is read onh For stack segment W bit must be logic 1 ; otherwise it is read/write segment. Microprocessors and Microcontrollers 4-11 Protected Mode The Fig. 49 (a) and (b) show the code segment descriptor access right byte configuration and data segment descriptor access right byte configuration. Accessed (1 = yes) Readable (1 = yes) Conforming (1 = yes) Executable (1 = yes for code) (Indicates segment descriptor for code or data) Descriptor Privilege Level Present (1 = yes) Fig. 4.9 (a) Code segment descriptor access right byte configuration MSB LsB Accessed (1 = yes) Writeable (1 = yes) Expand down (1 = down) Executable (0 = no for data) (Indicates sogment descriptor for code or data) Descriptor Privilege Level Present (1 = yes) Fig. 4.9 (b) Data segment descriptor access right byte configuration 4.5.1.2 System Segment Descriptors System segments gives the information of operating system tables, tasks and gates. Fig. 4.10 shows the general format of system segment descriptor. From Fig. 4.10 it can be seen that several descriptor fields (Base address, limit, Granularity bit G and Present bit P) are similar to the general segment descriptor. Fig. 4.10 also shows the various types of system segment descriptors. Let us discuss the various system segment descriptors. a) LDT Descriptors (S = 0, Type = 2): The LDT descriptors are present only in the Global Descriptor Table (GDT). They contain the information about the local descriptor tables. The local descriptor tables contains the segment descriptors which are unique to a particular task, The DPL (Descriptor privilege field) of this descriptor is ignored because it can be accessed with only privilege level 0. Microprocessors and Microcontrollers ___ 4-12 Protected Mode b) TSS Descriptor (S = 0, Type = 1, 3, 9, B): In a multitasking environment computer performs more than one task at a time, and it also switch between the task. A task can be a single program, or it can be a group of related programs. When it switches from taskl to task2, it stores all the information necessary to restart the taskl later in time exactly as it was left. It involves saving the contents of all of the processor registers as well as any read/write memory variables and the address of next instruction to be executed. Such information is called state of the task or context of the task. 3 16 0 ‘Segment Base 15.....0 ‘Segment Limit 15 Base sim, [P| OPE 6 31... 24 Type Defines Type Defines 0 Invalid 8 Invalid 1 Available 80286 TSS 9 Available intel Pentium processor TSS 2 LOT A Undefined (Intel Reserved) 3 Busy 80286 TSS B Busy Intel Pentium processor TSS. 4 80286 Call Gate © Intel Pentium processor Call Gate 5 Task Gate (for 80286 or Intel Pentium D Undefined (Intel Reserved) processor Task) E Intel Pentium processor Interrupt Gate 6 80286 Interrupt Gate F Intel Pentium Processor Trap Gate 7 80286 Trap Gate Note Ina maximum - size segment (i.e. a segment with G = 1 and segment limit 19 .. FFFFH), the lowest 12 bits of the segment base should be zero. (i.e. segment base 11 ..... 000 = O00H). Fig. 4.10 System segment descriptor The Pentium processor uses a special segment called task state segment (TSS) to store the state/context of the task. This segment can be addressed with the help of task state segment (ISS) descriptor. The TSS descriptor contains information about the location, size and privilege level of a TSS. Alongwith the context of the task, the TSS also contains the linkage field for the next task which allows the nesting of tasks. The TSS descriptor gives base address and limit for TSS. Its TYPE field is used to indicate whether task is currently BUSY (i.e. on a chain of active tasks) or the TSS is available. The TYPE field also indicates if the segment contains a 80286 or an Pentium processor TSS. Microprocessors and Microcontrollers 413 Protected Mode ¢) Gate Descriptors (§ = 0, TYPE = 4 - 7, C, F): A gate is a special type of descriptor. It allows the Pentium processor to automatically perform protection checks. There are four types of gate descriptors = Call gate = Task gate = Interrupt gate = Trap gate Call gates are used to change privilege levels 4.8.2 task gates are used to perform a task switch 4.12.5 and interrupt and trap gates are used to specify interrupt service routines. Fig. 4.11 shows the format of the four types of gate descriptors. a” 24 16 8 5 0 Selecior 0 4 Offset 31... 16 Name Type Value Description 80286 call gate Task gate (for 80286 or intel Pentium processor task) 80286 interrupt gate 80286 trap gate Intel Pentium processor call gate Intel Pentium proc Intel Pentium processor trap gate Description contents are not valid Description contents are valid PL - Least privileged level at which a task may access the gate. WORD COUNT 0 - 31 - the number of Parameters to copy from caller's stack to the called procedure's stack. The parameters are 32 - bt quantities, {or intel Pentium processor gates and 16 - bit quantities for 80286 gates. sor interrupt gate sonmorvoas Destination 16 - bit Selector to the target code segment Selector selector or Selector to the target task state segment for task gate Destination _—offset Entry point within the target code segment Offset 16 - bit 80286 32 - bit Pentium processor Fig. 4.11 Gate descriptor formats 4.5.2 Descriptor Tables As mentioned earlier, segment descriptors are grouped and placed one after the other in contiguous memory locations. This group arrangement is known as a descriptor table. Microprocessors and Microcontrollers __4-14 Protected Mode The maximum limit for the length of descriptor table is 64KBytes and we know that each descriptor takes 8 bytes to store the information of a particular segment. So descriptor table can have as many as 8192 descriptors. The upper 13 bits of a selector are used as an index into the descriptor table. There are three types of descriptor tables = Global Descriptor Table (GDT) = Local Descriptor Table (LDT) = Interrupt Descriptor Table (IDT) These are used for a different purpose. Thus it is necessary to consider use of a segment before deciding in which table it must be included. The Global Descriptor Table (GDT) is a general purpose table of descriptors, can be used by all programs to reference segments of memory. The GDT can have any type of segment descriptor except for descriptors which are used for serving interrupts. The Interrupt Descriptor Table (IDT) holds the segment descriptors that define interrupt or exception handling routines. The IDT is & direct replacement for the interrupt vector table used in 8086 system. A Local Descriptor Tables (LDT) are set up in the system for individual task or closely related group of tasks. Fig. 4.12 shows how tasks use its individual memory area defined by the descriptors from the corresponding local descriptor table and how it shares the memory area defined by the descriptors from the global descriptor table. . Task 1 Virtual Address Space Task 3 Virtual Address Space Task 2 Virtual Address Space Fig. 4.12 Memory area shared by different tasks Microprocessors and Microcontrollers 4-15 Protected Mode Descriptor Tables 1. Global Descriptor Table (GDT) = Unique table = Holds most of segments can be used by all program = May contain special system descriptors 2. Interrupt Descriptor Table (IDT) = Unique Table «Holds segment descriptors defined by interrupt or exception service routines 3. Local Descriptor Table (LDT) = Is optional = Extends range of GDT = Setup for individual task As we know, the descriptors are stored in the descriptor tables. But it is important to know that where these tables are stored? It is possible to place descriptor tables anywhere in the processor's address space and it is not necessary to keep them together. Each of the tables has a register associated with it the GDTR, the LDTR and the IDTR. Each of these register contains the 32-bit linear address of the base of its descriptor table and the table's limit. The base address of a descriptor table is the linear address of the first byte of the first descriptor in the table. The limit specifies how long the table is and therefore how many descriptors it has. Global Descriptor Table Register (GDTR) : Fig. 4.13 shows how the contents of the global descriptor table register are used to define a Global descriptor table in the Pentium processor physical memory address space. GDTR is a 48-bit register located inside the Pentium processor. The lower two bytes of this register specifies the LIMIT, (in bytes) for the GDT. The value of limit is one less than the actual size of the table. For example, if LIMIT is 03FFH then the table is 1024 (1023 + 1) bytes in length (03FFH = 1023)9). Since the LIMIT field is 16 bit long, the GDT can grow up to 65,536 bytes long. The upper four bytes of GDTR specifies the 32-bit linear address of the base of the Global Descriptor Table (GDT). Interrupt Descriptor Table Register (IDTR) : Like Global Descriptor Table Register, Interrupt Descriptor Table Register holds the 16-bit limit and 32-bit linear address of the base of the Interrupt Descriptor Table (IDT). Fig. 4.14 shows how the contents of the Interrupt Descriptor Table Register are used to define a Interrupt Descriptor Table (IDT) in the Pentium processor physical memory address space. Microprocessors and Microcontrol 4-16 Protected Mode Physical memory Global descriptor table register (GDTR) Global 47____ 40:39 16 15 ae Fig. 4.13 GDTR and GDT Interrupt descriptor table register (IDTR) Interrupt descriptor table (IDT) a7 40 39 16 15 Fig, 4.14 IDTR and IDT Microprocessors and Microcontrollers 4-17 Protected Mode Like GDTR, the IDTR is also 48 bit in length, with lower two bytes defines Limits and upper 4 bytes defines the base address. Since limit field is two bytes, the IDT can also be up to 65536 bytes long. But the Pentium processor only supports upto 256 interrupts or exceptions; therefore, the size of the IDT should not be set to support more than 256 interrupts. Local Descriptor Table Register (LDTR) : Unlike GDTR and IDTR, the LDTR is a 16-bit register. It does not specify any limit or base address for the segment but it specifies the address of the LDT descriptor stored in the Global descriptor table (GDT). Fig. 4.15 shows LDTR, GDT and LDT shows how contents of LDTR are used indirectly to define a Local Descriptor Table. Physical memory FFFFFFFF Descriptor 1 LDT Descriptor Descriptor 0 (00000000 Fig. 4.15 Global and local descriptor tables GDTR LDTR holds a selector that points to an LDT descriptor in the GDT. Whenever a selector is loaded into the LDTR, the corresponding descriptor is located in the global descriptor table. The contents of this descriptor defines the local descriptor table. The 32-bit base value defines:starting point of the table in the Pentium processor physical memory address space and 16-bit limit specifies the size of the table. The GDT can contain many LDT descriptors. To put particular LDT in service, it is necessary to load the LDTR with corresponding selector. Microprocessors and Microcontrollers _ 4-18 Protected Mode For loading the values in GDTR, IDTR and LDIR registers, Pentium processor provides LGDT, LLDT, and LIDT instructions. It also provides SGDT, SLDT and SIDT instructions. These (48 bits) instructions copy the contents of the descriptor table registers into the six bytes of memory pointed by the destination operand. These tables are manipulated by the operating system. Thus, the instructions used for loading the descriptor tables are privileged instructions. 4.5.3 More about Segment Registers From the previous discussion, we know that segment register contents are used as a selector to select specific descriptor from the descriptor table. This part of the segment register is visible to programmer. Fig. 4.16 shows complete segment register with visible and hidden part of it, The hidden part is referred to as segment descriptor cache register. Using these registers Pentium processor stores information from descriptor, thereby avoiding the need to consult a descriptor table every time it accesses memory. Segment Register (visible portion) contents are manipulated by programs whereas segment descriptor cache register (hidden portion) contents are manipulated by processor. Once the descriptors are cached, subsequent references to them are performed without any overhead for loading of the descriptor. This is the biggest advantage of segment descriptor cache registers. 16-bit visible selector Hidden Descriptor cs[ id ss Ds ES FS cs Fig. 4.16 Segment register and segment descriptor cache mm> Example 4.1 : Assume (DS) = 0204H [ESI] = 00002000H. Paging is disabled and mode is protected mode. 1. From which of the three descriptors (IDT, LDT, GDT) the descriptor will be considered ? Give the descriptor number. 2. Assume appropriate values in the descriptor selected and explain how the address translation takes place when the following instruction is executed. MOV AX, [ESI] Solution : 1. Here, DS register is used as a selector. Fig, 4.17 shows the definitions of the selector bits and the contents of DS are 0204H. Microprocessors and Microcontrollers __ 4-19 Protected Mode 6 0 Ds Register Fig. 4.17 From the figure we can see that RPL = 00 Tl=1 Since TI (Table Indicator) bit is set, the descriptor from the current LDT will be referred. 2. The descriptor gives the segment base address’ and segment limit. Let us assume segment base address = 0000 0000 H and limit = FFFFFH. As paging is disabled, the physical address of memory is given by PA = Base address + Offset Note : Offset < segment limit. In our case offset is given by ESI (0000 2000H), which is within the limit ie. less than segment limit. Therefore the physical address of memory, PA = 0000 0000 H + 0000 2000 H 0000 2000 H When MOV AX, [ESI] instruction is executed the contents from memory location 0000 2000H are copied into AL register and contents from memory location 0000 2001H are copied into AH register. 4.6 Paging Paging or page translation is the second phase of address translation. In this phase Pentium processor transforms a linear address generated by segment translation into a physical address. The page translation step is optional. Page translation is in effect only when the PG bit of CRO is set. Page translation is must if the operating system is to implement multiple virtual 8086 tasks, page-oriented protection, or page oriented virtual memory. Microprocessors and Microcontrollers 4-20 Protected Mode When paging is enabled, the paging unit arranges the physical address space into 1,048,496 pages that are each 4096 bytes long. Fig. 4.18 shows organization of physical address space using paging. 4KB 4KB Page 1,048,495 Be 4KB 4kB 4KB Fig. 4.18 Paged organization of the physical address space 4.6.1 Support Registers and Tables There are three components to the paging mechanism of the Pentium processor : Page directory, the page tables, and the page itself (page frame or page). Like segmentation, paging depends on special memory resident tables. Out of three components, page directory and page tables are in the table form. Both are made up of 32-bit descriptors. Unlike tables of segment descriptors, each page directory or page table must contain exactly 1024 descriptors, making each directory or table exactly 4096 bytes (4KB) long. A page frame is a 4 Kbyte unit of contiguous addresses of physical memory. When paging is enabled the linear Linear address. address generated by the segment ~ translation process is not used as a physical address. The Pentium [_ovecery [race [ote | processor uses two levels of tables to 31 22 21 42 11 0 translate the linear address (from the segment translation) into a physical Fig. 4.19 Linear address format address. Fig. 4.19 shows the format of linear address. Processor internally divides a linear address into three fields : Two fields of 10 bits each and one field of 12 bits. The most significant 10 bits (DIR field) of the linear address are used as an index into a page directory. The next most significant 10 bits (PAGE field) of the linear address are used as an index into the page table determined by the page directory. The least significant 12 bits (OFFSET) select one of 4096 bytes of memory from the page frame determined by the page table. The physical address of the current page directory is stored Microprocessors and Microcontrollers 4-21 Protected Mode in the control register (CR3) which is also referred to as page directory base register (PDBR). Fig. 4.20 shows how the Pentium processor converts the DIR, PAGE and OFFSET fields of a linear address into the physical address by consulting two levels of page tables. Linear ct ce CR3 Page directory Page table Page frame Fig, 4.20 Linear to physical address translation The descriptor in a page directory is referred to as a Page Directory Entry (PDE) and descriptor in the page table is referred to as Page Table Entry (PTE). 4.6.2 PDE Descriptor Fig. 421 shows format for page directory entry. A page directory entry is having six fields. 210 zu 31 9 Pape a ates Li Fig. 4.21 Page directory entry Page Table Address : The page table address specifies the physical starting address of the base of a page table. This field (page table address) specifies 20 most significant bits and remaining 12 bits are all 0's. This locates all page tables on 4K boundaries. Microprocessors and Microcontrollers __ 4-22 Protected Mode User/Avail : Bits 9, 10, and 11 are not used by the Pentium processor. Users are free to use them. Accessed Bit : The Pentium processor automatically sets accessed bit whenever PDE is used in « address translation or another page related function. It is never cleared unless you write code to do it manually. User/Supervisor and Read/Write Bits : These bits are not used for address translation, but are used for page-level protection which the Pentium processor performs at the same time as address translation. If User/Supervisor bit is set, the memory pages covered by this PDE are accessible from all privilege levels. If it is cleared, the pages are accessible only by PLO, 1 and 2. If User/Supervisor bit is cleared Read/Write bit has no effect. But if User/Supervisor bit is set and read/write bit is 1, memory pages covered by this PDE are write protected. If Read/Write bit is set, write privileges are allowed from PL3 code. The access rights just discussed are summarized in table 4.2. us RW | Permitted Level 3 | Permitted Access Levels 0, 1, or 2 o 0 None Read/Write oO 1 None Read/Write 1 0 Read-Only Read/Write 1 1 1 Read/Write Read/Write Table 4.2 Protection provided by R/W and US Present : The present bit indicates whether a page table entry can be used in address translation. P = 1 indicates that the entry can be used and page table pointed by PDE is present in the physical memory. If P = 0, the page table referred to is not present Fig. 4.22 shows the format of a not present page descriptor. 31 10 vee | Fig. 4.22 Not present page descriptor Microprocessors and Microcontrollers 4-23 Protected Mode 4.6.3 PTE Descriptor Fig, 4.23 shows format for page table entry. A page table entry has seven fields. 1211 24 31 9 6 5 0 Peters [me ees Fig. 4.23 Page table entry Page Frame Address : The page frame address specifies the physical starting address of a 4 KB page frame or a page. This field (page frame address) specifies 20 most significant bits and remaining 12 bits are all 0's. This locates all pages on 4K boundaries. User/Avail Bits : Bits 9, 10, 11 are not used by the Pentium processor. Users are free to use them. Accessed Bit : Accessed bit is set by the Pentium processor whenever this PTE is used in a paging related function. The Pentium processor never clears this bit. User can keep track of the most often used pages of memory by periodically testing and clearing this bit in all PTEs. Dirty Bit : The dirty bit is automatically set by the Pentium processor whenever page frame covered by PTE is written into. The Pentium processor never clears this bit. User can keep track of the most often written page of memory by periodically testing and clearing this bit. User/Supervisor and Read/Write Bits : These bits are not used for address translation, but are used for page-level protection which the Pentium processor performs at the same time as address translation. If User/Supervisor bit is set, the memory pages covered by this PTE are accessible from all privilege levels. If it is cleared, the pages are accessible only by PLO, 1 and 2 If User/Supervisor bit is cleared Read/Write bit has no effect. But if User/Supervisor bit is set and read/write bit is 1, memory pages covered by this PTE are write protected. If Read/Write bit is set, write privileges are allowed from PL3 code. Microprocessors and Microcontrollers _ 4-24 Protected Mode Present : The present bit indicates whether a page table entry can be used in address translation. P = 1 indicates that the entry can be used and page table pointed by PTE is present in the physical memory. If P = 0, the page table referred to is not present. Fig. 4.24 shows both the phases of address translation. It shows how logically address is converted into physical address when paging is enabled. 0 Base Address (82 bit) Linear crs Page directory Page table Page frame Fig. 4.24 Protected mode address translation Microprocessors and Microcontrollers 4-25 Protected Mode 4.7 Translation Lookaside Buffer or Page Translation Cache The Pentium processor paging mechanism is designed to support demand paged virtual memory systems. However, performance would degrade substantially if the processor was required to access two levels of tables (Page directory and page table) for every memory access. To solve this problem, the Pentium processor stores the most recently used page table entries in an on-chip cache. This cache is called the Translation Lookaside Buffer (TLB). The TLB holds upto 32 page table entries. The 32-entry TLB coupled with a 4K page size, results in coverage of 128K bytes of memory addresses. Whenever program generates linear address that maps to a page table entry (PTE) already in the cache, the Pentium processor can use the cached information it has internally. This saves two outside memory references, improving performance in address translation. For many common multi-tasking systems, the TLB will have a hit rate of about 98%. This means that the processor will only have to access the two-level page structure on 2% of all memory accesses. Fig. 4.25 illustrates how the TLB supports the Pentium processor paging mechanism. 32 Entries Physical memory Unear address | Translation look aside buffer Page directory Page table Fig. 4.25 Translation lookaside buffer 4.8 Paging Operation The paging mechanism receives a 32-bit linear address from the segmentation unit. The upper 20-bits of linear address are compared with all 32 entries in the TLB to determine if there is a match. If there is a match (ie. a TLB hit), then the 32-bit physical address is calculated and will be placed on the address bus. However, if the page table entry is not in the TLB, the Pentium processor reads the appropriate Page Directory Entry. If P = 1 on the Page Directory Entry indicating that the page table is in memory, then the Pentium processor reads the appropriate Page Table Microprocessors and Microcontrollers __4-26 Protected Mode Entry and set the Access bit. If P = 1 on the Page Table Entry indicating that the page is in memory, the updates the Access and dirty bits as needed and fetch the operand. Then Pentium processor stores the upper 20 bits of the linear address, read from the page table in the TLB for future accesses. However, if P = 0 for either the Page Directory Entry or the Page Table Entry, then the Pentium processor generates a page fault, an exception 14. The Pentium processor also generates an exception 14, page fault, if the memory access violates the page protection attributes (i.e. U/S or R/W) (e.g. trying to write a read-only page) If Pentium processor wants to access the physical memory space whose information is not in the cache then the Pentium processor examines the 32 existing cache entries and throws out the least recently used PTE. It then puts new PTE in its place. This method of updating cache is known as LRU (Least Recently Used). It is necessary to flush the entire cache whenever the page tables ate changed. The page-translation cache is invisible for application programmer but these are visible for system programmers. Thus system programmer's can flush the cache by using following methods. 1. By reloading CR3 with a MOV instruction. For Example : MOV CR3, EAX 2. By performing a task switching to a TSS that has a different CR3 image than the current TSS (Task Switching is explained in more detail later in this chapter). 4.9 Protection Problem may occur in a multitasking operating systems or multi-user systems when two or more users attempt to read and change the contents of a memory location at the same time. The section of a program where the value of a variable is being read and changed (critical section) must be protected from access by other tasks until the operation is complete. Another region that requires protection is the operating system code. The incorrect address in a user program may cause program to write over the critical sections of the operating system corrupting the operating system code and data areas. The system then ‘locks-up' and the only way to get control again is to reboot the system. In a multitasking system this is intolerable, so several methods are used to protect the operating system. The Pentium processor uses segment level protection and privilege level protection mechanisms to protect critical sections. When an attempt is made to access a segment by loading a segment selector into the visible part of a segment register, the protection mechanism of Pentium processor makes several checks such as type checking, limit checking privilege level checking and so on. In this section we are going to study the protection mechanism provided by Pentium processor to run the system relatively safe from accidental mishaps. Microprocessors and Microcontrollers __ 4-27 Protected Mode 4.9.1 Protection By Segmentation When an attempt is made to access a segment first of all, the Pentium processor checks to see if the descriptor table indexed by the selector contains a valid descriptor for that selector. If the selector attempts to access a location outside the limit of the descriptor table or the location indexed by the selector in the descriptor table does not contain a valid descriptor, then an exception is produced. The Pentium processor also checks to see if the segment descriptor is of the right type to be loaded into the specified segment register cache. The descriptor for a read-only data segment, for example cannot be loaded into the SS register, because a stack must be able to be written to. A selector for a code segment which has been marked “execute only" cannot be loaded into the DS register to allow reading the contents of the segment. If all above protection conditions are met, the limit, base, and access rights bytes of the segment descriptor are copied into the hidden part of the segment register. The Pentium processor then checks the P (Present) bit of the access byte to see if the segment for that descriptor is present, a type 11 exception is generated. After a segment selector and descriptor are loaded into a segment register, further checks are made each time a location in the actual segment is accessed. These checks are type checking and limit checking, Type Checking ‘Type field of the descriptor specifies type of the descriptor and the intended usage of the segment. As mentioned in the previous section, W (writeable), R (Readable), C (conforming), A (Accessed) and, E (Expanded-Down) bits from type field specify the usage of the segment and restrict segment for particular use only. For example, if R bit 1, the segment is read only segment. Its accessed is limited to only reading purpose. ‘Type checking is used to detect whether any program is attempting to use segments in ways not intended by the programmer. Limit Checking The Pentium processor uses limit field of a segment descriptor to prevent programs from addressing outside the segments. It interprets limit field depending on the setting of the G (granularity) bit, which specifies whether limit value counts 1 byte or 4 Kbytes. In case of data segments processor also checks ED (Expansion direction) bit and B (Big) bit. For all types of segments expand-down data segment, the value of the limit is one less than the size (expressed in bytes) of the segments. The Pentium processor causes a general protection exception when program attempts to = Access memory byte at an address > limit = Access memory word at an address 2 limit = Access memory Dword at an address > (limit-2) Microprocessors and Microcontrollers __4-28 Protected Mode For expand-down data segments, the limit is interpreted differently. In these cases the range of valid addresses is from limit + 1 to either 64K or 231-1 (4 Gbyte) depending on the B-bit. 4.9.2 Privilege Level Protection The Pentium processor has four levels of protection which are optimized to support the needs of a multi-tasking operating system to isolate and protect user programs from each other and the operating system. The four level of protections are four privilege levels, numbered from 0 to 3. The value zero repreents highest privilege level and value 3 represents lowest privilege level. Fig. 426 shows how a Pentium processor protected mode system can be set up with four privilege levels. It shows that operating system kernel is assigned with the highest privilege level, which is privilege level 0 (PLO). The system services such as BIOS procedures are assigned with PL1, whereas custom device drivers are assigned with PL2 and finally application programs are assigned with PL3. Task Applications Custom Extensions Task 8 Task A Fig. 4.26 Assignment of privilege levels The Pentium processor assigns these levels to different objects such as descriptors and selectors. The assigned privilege levels are stored in the respective fields as given below. = Descriptors contain field called the descriptor privilege level (DPL) = Selectors contain field called the requester’s privilege level (RPL). The RPL is intended to represent the privilege level of the procedure that originates a selector. Microprocessors and Microcontrollers 4-29 Protected Mode = The Pentium processor stores the descriptors in the internal cache (hidden portion of segment registers) for currently executing segments. Privilege levels for such descriptors are referred to as current privilege level (CPL). Now we see how Pentium processor evaluates the right of a procedure to access another segment and thus how it achieves the remaining aspects of protection. 4.9.2.1 Restricting Access to Data When an attempt is made to access a data segment by loading a segment selector into the visible portion of a data segment register (DS, ES, FS, GS, $S) , the Pentium processor automatically makes several checks by comparing privilege levels. The Pentium processor checks three different types of privilege levels as shown in Fig. 4.27. 16-Bit visible selector Invisible Descriptor Target segment selector Privilege check byCPU Data Segment Descriptor Segment Limit 15..... CPL. - Current Privilege Level RPL - Requestor’s Privilege Level DPL - Descriptor privilage level Fig. 4.27 Privilege check for data access 1. The CPL (Current Privilege Level) 2. The RPL (Requester’s Privilege Level) of the selector used to specify the target segment. 3. The DPL of the descriptor of the target segment Program can load a data segment register only if the DPL of the target segment is numerically greater than or equal to the maximum of the CPL and the selector’s RPL. In other words, a procedure can only access data that is at the same or less privileged level. Following Table 4.3 gives exact idea about data access. Microprocessors and Microcontrollers 4-30 Protected Mode No Privilege Levels Access DPL CPL RPL 1 2 0 1 Valid 2 3 1 2 valia 3 1 1 0 Valid 4 1 2 0 Invalid 5 2 2 3 Invalis Table 4.3 Data accesses 4.9.2.2 Accessing Data in Code Segments It is possible to read data from code segment. There are three ways of reading data from code segments. 1. Load a data segment register with a selector of a non conforming, readable, executable segment. 2. Load a data segment register with a selector of a conforming, readable, executable segment. 3. Use a CS override prefix to read a readable, executable segment whose selector is already loaded in the CS register. In case 1, procedure can only access data that is at the same or less privileged level. Case 2 is always valid because the privilege level of segment whose conforming bit ‘s set. Case 3 is also always valid because the DPL of the code segment in CS is by definition, equal to CPL. 16-Bit visible selector Invisible Part Privilege check byCPU Code Segment Descriptor Base 31 Sogmont Base 15 ....0 ‘Segment Limit 15 CPL - Current Privilege Level RPL - Requestor’s Privilege Level DPL. - Descriptor privilege level Fig. 4.28 Privilege check for control transfer Microprocessors and Microcontrollers 4-31 Protected Mode 4.9.2.3 Restricting Control Transfers The Pentium processor can transfer program control with the help of JMP, CALL, RET, INT ard IRET instructions. The “near” forms of JMP, CALL and RET transfer control within the current segment so these are subjected to only limit checking. But in case of far JMP, CALL and RET transfers, control is transferred to other segment. In such cases Pentium processor performs privilege checking. To successfully transfer the control to other segment, both the RPL and the CPL must be a number less than or equal to the DPL of the segment. In other words, the privilege level of the requesting selector and current privilege level must both be greater than or equal to the privilege level of the desired segment. Max (CPL, RPL) < DPL 4.9.3 Inter-privilege Level Transfer of Control After looking all these restrictions the question that might come to mind at this point is, if @ task cannot access a segment with a more privileged (numerically less) DPL, how can user programs access the operating system kernel, BIOS, or utility procedures in segments which have more privileged (nimerically less) DPLs ? There are two ways to access a procedure located in a-segment which has a higher privilege level. 1. The first option has a restriction that the segment which has a higher privilege level must be a conforming code segment. 2. The second option is more complex, but allows to access the segment which has a higher privilege level using special structure known as Call Gate. 4. 1 Conforming Code Segment A code segment is considered conforming if bit 2 of the access rights byte of its descriptor is set. Conforming code segments have no inherent privilege level of their own; they confirm to that level of the code that CALLs them or JMPs to them. For example, if a program in a PL3 segment transfers control to a conforming code segment, then the conforming code runs with CPL equal to 3. If the same segment is invoked by PLO code, it runs with a CPL of 0. When the control is transferred to a conforming code segment, the RPL bits of register CS are not changed to match the DPL of segment, as they normally would be. Instead, they still reflect the correct CPL the DPL of the last non-conforming code segment that was executed. This is the only time that the RPL bits in the CS register might not match the DPL bits in the currently executing segment. Even though conforming code segments do not have any particular privilege level associated with them, there is still one restriction regarding when a conforming segment can be used. The DPL of the conforming descriptor must always be less than or equal to the current CPL. You can never transfer control to a segment whose DPL is greater (less privileged) than the current segment. This is done because at the time of transferring control back to the original segment from conforming code segment there is change in Microprocessors and Microcontrollers 4:32 Protected Mode privilege level. Here, conforming code segment must have higher or same privilege level than original segment to allow control to return back to the original segment. The following Table 4.4 gives exact idea about access of conforming code segment. No. Current Privilege DPL of Conforming Access Level (CPL) Code Segment 1 3 2 Valid 2 2 ° Valid 1 3 1 1 Valid 4 1 2 Invalid 5 2 3 Invalid Table 4.4 Accessing of conforming code segment 4.9.3.2 Call Gates A call gate is simply a special type of descriptor as shown in 4.29. Unlike code, data, or stack descriptors or the system type LDT descriptor, call gate descriptors do not define any memory space. They have no base address or limit fields. Actually, they are not descriptors at all, but it is convenient to place them in descriptor tablés. It acts as an interface layer between code segments at different privilege levels. The “call gate” is the only mechanism that allows to call a procedure located in any segment (conforming or non-conforming) which has a higher privilege level. JMPs are not allowed. Hence the name “call gate”. It is important that the CALL must refer a call gate, not the destination code segment. The call gate defines the code segment and the exact offset where the control is to be transferred. Users are not allowed to specify the desired offset in their programs. Because any wrong offset may corrupt the procedure if control is transferred into the middle of a subroutine or, worse yet, into the middle of an instruction. Call Gate descriptor is put in the GDT or in LDT, just as segment and other descriptors. When a program does a CALL to a procedure in another segment, the selector for that segment’s call gate is placed into the visible portion of CS register, and the CALL gate descriptor is Call Gates 31 23 15 7 0 Offset 31 .... 24 Selector Fig. 4.29 Format of Pentium processor call gate Microprocessors and Microcontrollers 4-33 Protected Mode placed in hidden portion of CS register. The call gate descriptor contains two important things : 1. Selector which points to the descriptor for the segment where the procedure is actually loaded. 2. Offset of the called procedure in its segment. . If the call is valid, the selector from the call gate (points to the descriptor for the segment where the procedure is actually loaded) is placed in the visible portion of CS register and the corresponding segment descriptor is loaded into the hidden portion of CS register. The Pentium processor then uses the base address from the segment descriptor and the offset from the call gate descriptor to calculate the physical address of the called procedure as shown in Fig. 430. Selector Opcode Offset [ee [ree [] Descriptor Table Gate Descriptor segment T pst T Code) ramon ‘Segment Descriptor > Ye Fig. 4.30 Indirect transfer via call gate Microprocessors and Microcontrollers 4-34 Protected Mode Call Gates = Are defined like segment descriptors = Do not define any memory space = Occupy a slot in the descriptor tables = Provide the only means to alter the current privilege level = Define entry points to other privilege levels = Must be invoked with a CALL instruction During this process the validity of control transfer is checked using four different privilege levels 1. The CPL (Current Privilege Level) 2. The RPL (Requester’s Privilege Level) of the selector used to specify the call gate 3. The DPL of the gate descriptor 4. The DPL of the descriptor of the target executable segment For valid control transfer, the transfer must satisfy the following privilege rules for CALL instruction as shown in Fig. 4.31 (a). Target DPL < Max (RPL, CPL) < Gate DPL For example, if you are running in a PL2 code segment (CPL=2), and you want to call a PLO procedure (target DPL=0), you must use a gate to that procedure with a DPL of 2 or 3, Fig. 4.31 (b) shows some valid accesses to higher privileged levels. 16 - Bit Visible ‘Selector Invisible Descriptor Target Selector tt }~ [ Index RPL Privilege Check Gate Selector By cPu [ome [om | com] Executable ‘Segment Descriptor CPL - Current Privilege Level RPL - Requestors Privilege Level DPL - Descriptor Privilege Level Fig. 4.31 (2) Privilege check via call gate Microprocessors and Microcontrollers 4-35 Protected Mode Privilege requirements te use a call gate : = Call gate DPL must be numerically greater than or equal to the current privilege level = Call gate DPL must be numerically greater than or equal to the RPL of the gate selector i = Call gate DPL must be numerically greater than or equal to the target code segment DPL = Target code segment DPL must be numerically less than or equal to the current privilege level. Changing Privilege Levels and Changing Stacks PL (Pisiege PL PL PL Fig. 4.31 (b) Some valid acc es to higher privileged levels using call gates, In call gates, the procedure is accessed indirectly, through the call gate descriptor, rather than directly through a segment descriptor. This indirect access has two major advantages. Microprocessors and Microcontrollers 4-36 Protected Mode 1. This approach permits another level of privilege checking before access to the procedure in the higher privileged segment. The privilege level of the calling program (CPL) is compared with DPL of the call gate. If the privilege level of the calling program (CPL) is numerically greater than the DPL of call gate, the access will not be allowed. 2. User programs cannot accidently enter higher privileged segments at just any old point. If they are going to enter at all, they must enter at the specific offset contained in the call gate descriptors. 4.9.4 Changing Stacks The change in privilege level changes the address domain of the program. The Pentium processor also changes stacks in case of change in privilege level. When call gate causes a change in privilege, stack segment and pointer ate saved, and a new stack is used that corresponds to the new, inner privilege level. When controls returned to outer level code, the use of the original stack is restored. If there is a valid call through gate, the Pentium processor uses a new stack. It takes segment selector and the pointer for this stack from the TSS (Note : TSS is discussed in section 5.3). If user is calling procedure with privilege level 1 (PL1), the new stack selector and stack pointer are taken from SS1 and ESP1, respectively. The old stack selector and stack pointer are immediately pushed onto this new stack. Then Pentium processor finds the number of double word (32-bit) entries to be pushed from old stack to new stack from WC (Word Count) field from the call gate descriptor. This means that WC field decides number of passing parameters to the new stack. After this, old CS selector and EIP offset are pushed onto the new stack. Finally, CS is loaded from the selector field of the call gate descriptor, EIP is loaded from the offset field, and execution starts at the new address. 4.9.5 Page Level Protection Page level protection involves two kinds of protections 1. Restriction of addressable domain _2. Type checking The U/S and R/W fields of PDEs and PTEs are used to control access to pages. 4.9.5.1 Restricting Addressable Domain The U/S bit is 0 for the operating system and other system software and related data. It is a supervisor level. When the Pentium processor is executing at supervisor level, all pages are addressable. If U/S bit is 1, Pentium processor is executing at user level. In this case, only pages that belongs to the user level are addressable. Microprocessors and Microcontrollers _ 4-37 Protected Mode 4.9.5.2 Type Checking At the level of page addressing two types of accessing are defined. 1. Read only access ( R/W = 1) 2. Read/write access ( R/W = 0) When Pentium processor is executing at supervisor level, all pages are assigned with Read/write access, whereas at user level page access depends on R/W bit in the PDE and PTE fields. If R/W bit is 1 pages are only readable and if R/W bit is 0 pages are both readable and writeable. When Pentium processor is executing at user level, it cannot access page belongs to supervisor level. 4.10 Privileged Instructions There are 19 privileged instructions supported by Pentium processor. Privileged instructions are those that affect the segmentation and protection mechanism, alter the interrupt flag, or perform peripheral I/O. These instructions are divided in two groups. 1. Privileged Instructions (Group 1) 2. IOPL - Sensitive Instructions (Group I) 4.10.1 Privileged Instru The instructions that affect the system data structures are come under first group. The instructions under this group must be executed when CPL is 0; otherwise Pentium processor generates general protection exception. Table 4.5 shows the instructions from group I (Privileged Instructions). ns Instruction Action HUT Halts the processor cLTs: Clears task-switched flag LGDT, LIDT, LLOT Loads GDT, IDT, LOT registers uR Loads task register iwsw Loads machine status word MOV CRn, REG/MOV REG, CRn Moves to/from control registers MOV DRn, REGIMOV REG, DRn Moves to/from debug registers. MOV TRn, REGMOV REG, TRn Moves to/from test registers Table 4.5 Privileged instructions Microprocessors and Microcontrollers 4-38 Protected Mode 4.10.2 IOPL Sensitive Instructions Here, the IOPL field in the FLAG register defines the right to use I/O related instructions. Hence the instructions from this group are called sensitive instructions. Table 4.6 shows the IOPL sensitive instructions. Disabies interrupts stl Enables interrupts IN, INS Inputs data from 1] port our, ouTs Outputs data to VO port Table 4.6 IPL - sensitive instructions In order to execute these instructions, the CPL of a procedure or task must be the same or a lower number than the number represented by the IOPL bits (CPL < IOPL). 4.11 Special Protection Mode Instructions SGOT ‘Store Global Descriptor Table SiOT ‘Store Interrupt Descriptor Table STR Store Task Register SLOT ‘Store Local Descriptor Table Got Load Global Descriptor Table or Load Interrupt Descriptor Table ur Load Task Register LLoT Load Local Descriptor Table ARPL Adjust Requested Privilege Level aR Load Access Rights LsL, Load Segment Limit VERRIVERW Verity Segment for Reading or Writing usw Long Machine Status Word (ower 16 bits of CRO) susw Store Machine Status Word 4.12 Demand Paging Paging hardware of Pentium processor has three major capabilities = Address translation = Page - level protection = Demand paging Microprocessors and Microcontrollers _ 4-39 Protected Mode In the last section we have seen address translation mechanism by which logical address is converted into physical address when paging is enabled and we have also seen the page level protection. In this section, we are going to see demand paging. Demand paging allows system to create a virtual environment for their programs. Neither the program code nor the programmer writing it needs to know how much RAM is really available in the system or where it is located. If a program makes reference which is not in the main memory, the Pentium processor will call a page fault handler. Using this page default handler routine, it then retrieves the desired data from secondary storage (such as a disk) and places it in memory. The previous contents of memory are swapped with data from the disk. In this way, it is possible to create an impression of a system with huge amount of main memory. Its actual size and its location are never known to the program or the programmer, but everything runs as desired. 4.13 Moving to Protected Mode The Pentium processor begins execution in real mode immediately after RESET signal. To enter into the protected mode, it is essential to maintain system tables such as Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT) and Local Descriptor Table (LDT). To enter into the protected mode one must have atleast one GDT and an IDT defined in the system. The IDT must be atleast 256 bytes long and the GDT must contains atleast one code and the data segment. To enter into protected mode it is necessary to load CRO with PE bit 0 SET using, MOV instruction to CRO. The PE bit can also be set by LMSW instruction which maintains the 80286 compatibility. After enabling protected mode, the next instruction should be an intersegment JMP to reload the CS selector which will point to a valid code segment selectors initialized to a same value. The following steps accomplish the switch from the real mode to the protected mode. = Prepare GDT with a null descriptor in the first GDT entry, one code segment descriptor one stack segment descriptor and one data segment descriptor. = Initialize the interrupt descriptor table so that it contains valid interrupt gates for at least the first 32 interrupt type numbers. The IDT may contain up to 256 8-byte interrupt gates defining all 256 interrupt types. = Load the base address and limit of the GDT to GDTR register, using ‘LGDT’ instruction. = Set PE flag in CRO register, using "MOV CRO" or "LMSW" instruction (for compatibility with Intel 286) = Immediately, execute an intersegment (far) jump to load the CS register and flush the instruction decode queve. = Load all the data segment registers with the initial selector values. The Fig. 432 (a) shows the tables needed and Fig. 4.32 (b) shows the descriptors needed for a simple protected mode Pentium processor system. The simple protected mode Microprocessors and Microcontrollers __4~40 Protected Mode Pentium processor system has a single code and single data/stack segment each 4 G bytes long and a single privilege level PL = 0. 3t oO seen " Initialization: PRFFFFFOH oe User memory Data descriptor 0000 0118H 0000 0110H 0000 0108H ‘Null selector cog Interry 4 Base address NXT desire (32) Pr 0000 0000 Fig. 4.32 (a) Simple protected system Data Segment base 15.....0 ‘Segment limit 18... descriptor | ora FFEFH Base 31...24|6 tee Bat oor 1 ‘4 OH Code Segment base 15... 0 Segmentlimit 15.....0 descriptor | ont FFEFH FH OOH [re [oer Fig. 4.32 (b) GDT descriptor for simple system Microprocessors and Microcontrollers 4-41 Protected Mode An alternative approach to entering protected mode which is especially appropriate for multi-tasking operting systems, is to use the built-in task switch to load all of the registers. In this case the GDT should contain two TSS descriptors in addition to the code and data descriptors needed for the first task. The first JMP instruction in protected mode should jump to the TSS causing the task switch and loading all of the registers with the values stored in the TSS. The TSS register should be initialized to point a valid TSS descriptor since a task switch saves the state of the current task in a task state segment. The steps required for entering protected mode using alternative approach are as follows : = Initialize the interrupt descriptor table so that it contains valid interrupt gates for at least the first 32 interrupt type numbers. The IDT may contain up to 256 8-byte interrrupt gates defining all 256 interrupt types. «Initialize the global descriptor table so that it contains at least two task state segment (TSS) descriptor, and the initial code and data segments required for the initial task. "Initialize the task register (TR) so that it ponits to a valid TSS descriptor since a task switch saves the state of the current task in a task state segment. "= Switch to protected mode by using an intersegment JMP to load the CS register and flush the instruction decoder queue. The first JMP intruction in protected mode would jump to the TSS casuing a task switch and loading all the registers with the values stored in the TSS. 4.14 Switching Back to Real Address Mode It is possible to enter into Real Mode from Protected Mode by resetting the PE bit of the CRO register, with MOV CRO, (Reg. or Mem) instruction. But before returning to the real mode one must check that all the values used by the processor should be legal Real Mode values. It is suggested to use the following sequence of operations for returning to the Real Mode. 1. If paging is enabled do the following operations a. Transfer control to linear addresses that have an identity mapping. This means that transfer the control to the addresses where linear addresses are equal to physical addresses. .b. Clear the PG (Paging) bit in CRO. c. Load zeroes to CR3 to clear out the paging cache. 2. Transfer control to a segment that has a limit of 64K (FFFFH). This ensures that the contents of CS register are within the limit of 64K, which is required in real mode. 3. Load segment registers SS, DS, ES, FS, and GS with a selector that points to a descriptor containing the values given in the following Table 4.7. Microprocessors and Microcontrollers 4-42 Protected Mode Descriptor Value Base Any Limit 64K (FFFFH) Present P Writeable w Expand up E= Byte granular G=0 Table 47 4. Disable interrupts with instruction clear interrupts (CLI). A CLI instruction disables INTR interrupts. NMIs can be disabled using external circuitry. 5. Clear PE bit. 6. Flush the instruction queue by executing a Far JMP to the real mode code. This also puts the appropriate values in the access rights of the CS register. 7. Load the base and limit of the real mode interrupt vector table/interrupt descriptor table (IDT) using LIDT instruction. 8 Enable interrupts. 9. Load the segment registers as required by the real mode code. 15 Virtual Memory In most modern computers, the physical main memory is not as large as the address space spanned by an address issued by the processor. Here, the virtual memory technique is used to extend the apparent size of the physical memory. It uses secondary storage such as disks, to extend the apparent size of the physical memory. Let us see how this technique works When a program does not completely fit into the main memory, it is divided into segments. The segments which are currently being executed are kept in the main memory and remaining segments are stored in the secondary storage devices, such as a magnetic disk. If an executing program needs a segment which is not currently in the main memory, the required segment is copied from the secondary storage device. When a new segment of a program is to be copied into a main memory, it must replace another segment already in the memory. In modem computers,: the operating system moves program and data automatically between the main memory and secondary storage. Techniques that automatically swaps program and data blocks between main memory and secondary storage device are called virtual memory. The addresses that processor issues to access either instruction or data are called virtual or logical address. These addresses are translated into physical addresses by a combination of hardware and software components. If a virtual address refers to a part of the program or data space that is currently in the main memory, then the contents of the appropriate location in the main memory are Microprocessors and Microcontrollers __4-43 Protected Mode accessed immediately. On the other hand, if the referenced address is not in the main memory, its contents must be brought into a suitable location in the main memory before they can be used. : We have seen that, how virtual memory removes the programming burdens of a small, limited amount of main memory. Along with this it also allows efficient and safe sharing of memory among multiple programs. Consider a number of programs running at once on a computer. The total memory required by all the programs may be much larger than the amount of main memory available on the computer, but only a fraction of this memory is actively being used at any point in time. The main memory need to contain only the active portions of the many programs. This allows use to efficiently share the processor as well as the main memory. Fig. 433 shows a typical memory organisation that implements virtual memory. The memory management unit controls this virtual memory system. It translates virtual address into physical addresses. A simple method for translating virtual addresses into physical addresses is to assume that all programs and data are composed of fixed length unit called pages, as shown in the Fig. 4.34. Pages constitutes the basic unit of information that is moved between the main memory and the disk whenever the page translation mechanism determines that a swaping is required. Virtual address Page 1,048,495 4KB Page 1,048,494 | |4KB Physical address Data Physical address space 4KB Data Physical address 4kB Main Memory 4KB DMA transfer Fig. 4.33 Virtual memory Fig. 4.34 Paged organisation of the physical ‘organisation address space Microprocessors and Microcontrollers 4-44 Protected Mode 4.15.1 Address Translation In virtual memory, the address is broken into a virtual page number and a page offset. Fig. 435 shows the translation of the virtual page number to a physical page number. The physical page number constitutes the upper portion of the physical address, while the page offset, which is not changed, constitutes the lower portion. The number of bits in the page offset field decides the page size. Virtual address from processor Page table base register ———/"——~ Page table address Virtual page number | Ofiset_] Fw Control Page number bits inmemory Physical address. in main memory Fig. 4.35 Virtual to physical address translation The page table is used keep the information about the main memory location of each page. This information includes the main memory address where the page is stored and the current status of the page. To obtain the address of the corresponding entry in the page table the virtual page number is added with the contents of page table base register, in which the starting address of the page table is stored. The entry in the page table gives Microprocessors and Microcontrollers 4-45 Protected Mode the physical page number, in which offset is added to get the physical address of the main memory. If the page required by the processor is not in the main memory, the page fault occurs and the required page is loaded into the main memory from the secondary storage memory. by special routine called page fault routine. This technique of getting the desired page in the main memory is called demand paging. To support demand paging and virtual memory processor has to access page table which is kept in the main memory. To avoid the access time and degradation of performance, a small portion of the page table is accommodated in the memory management unit. This portion is called translation lookaside buffer (TLB) and it is used to hold the page table entries that corresponds to the most recently accessed pages. When processor finds the page table entries in the TLB it does not have to access page table and saves substantial access time. Review Questions Draw the programmer's model of Pentium processor in protected mode. What is segmentation ? Explain the necessity of protection in Pentium processor. Assume (DS) = 0204H, (ESI) = 00002000H, paging is disabled and mode is protected mode. a. From which of the three descriptor table (IDT, LDT, GDT) the descriptor will be considered ? Give the descriptor entry number Aw b, Assume appropriate values in the descriptor selected and explain how the address translation takes place when the following instruction is executed. MOV AX, [SI] 5. Explain the function of TI and RPL bit 6. State how the granularity bit affects the limit field. 7. Explain the meaning and usage of ‘Expand down’ segments. How are the base and limit fields interpreted for these segments ? 8. What are the various fields in page directory entry and poge table entry ? What are their uses ? 9. Explain the functions of RPL, CPL and DPL. 10. What is the purpose of TLB and descriptor cache ? How do they reduce system overheads? 11. Explain with an example how logical address is converted with respect to PDE, PTE (Page Table Entry), Page frame, GDTILDT. Assume suitable data and state the volues you have chosen. 12, What is the meaning of privileged instructions in Pentium processor ? State whether or not PUSH and POP are privileged instructions. Microprocessors and Microcontrollers 4-46 Protected Mode 13. 14. 15, 16. 17. 18. 19. 21. 22. 24, 25, 26. 27. 28. State the privilege rules for 1. Accessing data in code segment, 2, Control transfer Discuss the mechanism by which Pentium processor user operating at PL3 or PL2 can call procedures at high privileged level through CALL gates. Outline clearly the checks made by Pentium processor. Why does Pentium processor support different stacks when it changes privilege level during CALLs ? Whaf parameters are saved ? Give details. What is the function of NT bit ? What is a difference between conforming code segment and non-conforming code segment ? How many global description can be stored in GDT ? Justify. . How does the selector choose the local descriptor table and Pentium processor access it ? Explain in detail seament level protection using privilege mechanism. What is the purpose of Word Count bit in call gate descriptor ? What do you mean by segment descriptor cache ? Explain the use of it. Explain the purpose, structure and locations of various descriptor tables used in Pentium processor. What are privileged and sensitive instructions ? Explain the page level protection mechanism in Pentium processor. Write down the procedure to enter in protected mode from real mode. Write down the steps to switch back to real mode from protected mode. Write a short note on virtual memory. aoa Multitasking 5.1 Introduction Microcomputer systems are shared by several computer operators, or users. Each user commonly has a terminal that is connected to the computer and he is allowed to do his ‘own work. For example, one user in the personal department may calculate company payroll, another in finance may do financial estimates, and another in engineering may use the computer to do CAD work. All these users are using the same computer at the same time. Now the question is how one computer serve so many users ? The answer is a digital computer runs at extremely high speed and thus it shares. this time among more than one user at a time, Actually computer is serving only one user for a short time and then moving on to the next user, and so on. Each user is allotted a time slice. This time slice is a small fraction of second. The computer in rotation allocates this time to all users. This technique is known as time sharing and this is how nearly all computers support multiple user operations. Time Sharing = Allows multiple users to use the same computer = Provides economical use of processing resources = Is invisible to the users = Can work for any number of users ‘An operating system which coordinates the actions of a time-share system such as this is referred to as a multi-user operating system. The program or section of program for each user is referred to as a task, so a multi-user operating system implies multitasking. But reverse is not necessarily true. Multi-user Versus Multitasking = Multi-user = Many different people using one computer. = Multitasking = Many different tasks on one computer. = A Multi-user computer can perform many tasks for many users. = A multitasking computer can perform many tasks for one user. (5-1) Microprocessors and Microcontrollers 5-2 Multitasking 5.2 Scheduling Methods for Muli There are different approaches for implementing multi-user operating system. user Operating System 1. Time-slice scheduling 2. Pre-emptive Priority Based scheduling. 5.2.1 Time-Slice Scheduling There is a specific component of the operating system which determines when it is time to switch from one task to another is called the scheduler, dispatcher or supervisor. In the previous discussion we have seen time slice method in which CPU executes one task for small period of time ( fraction of second ) then switches to the next task. After executing all tasks in a sequence, CPU returns to the first task. The advantage of the time-slice approach in a multi-user system is that all users are serviced at approximately equal time intervals. If number are more, time slices which each user gets, are less. Thus each user's program takes more time to execute. This is referred to as system degradation. Due to this, system having more number of users prefer pre-emptive priority-based scheduling approach. 5.2.2 Pre-emptive - Priority Based Scheduling In this system, each task has given a priority number and higher priority tasks are allowed to interrupt lower priority tasks. This means that when lower priority task is in execution, higher priority task can take control and after completion of higher priority task, it returns the control to the lower priority task. This approach is suitable for most applications, because it allows the most important tasks to be done first. 5.2.3 Context Switching Each task uses register, data pointers, memory pointers, memory variables stack area, etc. This is referred to as environment or context of that task. When a task switch occurs, the environment or context of the interrupted task must be saved so that the task can be continued properly when it receives another time-slice. This environment and pointer to environment is usually stored in special memory segment or on a stack. When it is necessary to switch back to the task again, the operating system uses the pointer to access the saved environment. This process is known as context switching. 5.3 Support Registers and Related Descriptors for Multitasking The Pentium processor has special registers, special descriptors to support efficient and protected multitasking system. These are : = Task State Segment (TSS) = Task State Segment Descriptor = Task Register = Task Gate Descriptor Microprocessors and Microcontrollers 5:3. Multitasking With these registers and data structures the Pentium processor switches execution from one task to another task, saving the environment of the current task. Thus task can be continued later. Apart from simple task switch, the Pentium processor supports two other task management features = Interrupts and exceptions can cause task switches, The Pentium not only switches automatically to the task that handles the interrupt or exception, but it automatically switches back to the interrupted task when the interrupt or exception has been serviced. Interrupt task may interrupt lower-priority interrupt tasks to any depth. = As each task can have separate LDT and page directory, it can have a different * logical to linear mapping and a different Jinear-to-physical mapping. Due to this tasks can be isolated and prevented from interfering with one another. 5.3.1 Task State Segment (TSS) Fig. 5.1 shows the format of a TSS. It is a special type of segment, used to manage the task. The Pentium processor uses TSS like a scratch-pad. It stores everything it needs to know about a task in TSS. This means that task environment (context) is stored in the TSS. TSS is not accessible to the general user program or program even at privilege level 0. The fields within TSS are accessible to only Pentium. The fields of a TSS are divided into two sets : Dynamic set and static set. 1, Dynami Set : The Pentium processor updates dynamic set when it switches from one task to another task. This set includes : = The general registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI ) = The segment register (ES, $8, DS, ES, FS, GS) = The flag registers (EFLAGS) = The instruction pointer (EIP) = Back link The first four fields (general registers, segment registers/selectors, flags and instruction pointer) save the state of the microprocessor, Pentium processor. Saving EIP guarantees that the task can be restarted at the point at which it was stopped and saving EFLAGs allows Pentium processor to execute conditional instructions properly, when the task is restarted. The Back Link is used by the Pentium processor to keep track of a previous task. By executing a return instruction at the end of the new task, the back link selector for the previous TSS is automatically loaded into task Register. This activates the previous task and restores the prior program environment. Microprocessors and Microcontrollers 5-4 Multitasking 31 ° BitMap Offset [ooooao000000000[7] 64 0900000000000000 oT 60 (0900000000000000 5c ‘0900000000000000 FS. 58 (0000090000000000 os 0000000000000000 0000000000000000 0000000000000000 SSSRBSSKEBGSSPtSESSE 0000000000000000 EIP2 “4 (0000000000000000 S81 0 EIP1 oc 0000000000000000 ‘0000000000000 Back link Fig. 5.1 Task state segment 2. Static Set : The Pentium processor only reads fields from this set. This set includes : = The selector for the task’s LDT = The register (PDBR) that contains the base address of the task’s page directory = Pointers to the stacks for privilege levels 0-2 =) The T-bit (debug trap bit) which causes the Pentium processor to raise a debug exception when a task switch occurs. = The 1/O map offset. Microprocessors and Microcontrollers 5-5 Multitasking Note: TSS static set saves the selector for the task’s LDT. This means that TSS descriptors must appear only in the GDT. Task switching may change the privilege level changing the addressable domain of the program. As rule says the privilege level of the stack segment must exactly match the privilege level of the code segment at all times, the Pentium processor has to change stack when there is change in privilege level. Due to this previous stack segment and pointer are abandoned, and a new stack is used that corresponds to the new privilege level. When control is returned to previous level, the previous stack is restored. To store stack pointer and stack selector of the previous task fields ESP0, ESP1, ESP2, SSO, SS1, SS2 hold the stack segment pointers and stack selectors for privilege levels 0, 1 and 2. The 1/O map base holds the 16-bit offset of the beginning of the I/O permission bit map. It is implemented on a task-by task basis and affects the hardware privilege checking only for I/O instructions. Privilege checking mechanism for I/O is described in the previous section of this chapter. 5.3.2 TSS Descriptor Like other segments, the task state segment is defined by descriptor called TSS descriptor. Fig. 5.2 shows the task state segment descriptor. It contains fields like other segments. The B-bit in the type field indicates whether the task is busy. Tasks are not re-entrant, The B-bit allows Pentium processor to detect an attempt to switch to a task that is already busy. The BASE, LIMIT, and DPL fields and the G-bit and the P-bit have functions similar to other descriptors. The limit field, however must have a value equal to or greater than 103 (104-1), because Pentium processor requires minimum 104 bytes of storage in order to perform a context save. A larger limit is permissible and it is required if an 1/O permission map is present. The maximum limit for TSS is 4GByte. Segment Base 15 ... Segment Limit 15... o Fig. 5.2 Task state segment descriptor To access TSS descriptor; the procedure must have privilege level less than or equal to (numerically) privilege level specified by DPL field of the TSS descriptor. Usually this access is restricted for only trusted softwares, whose privilege level is zero. This can be done by setting DPL fields of TSS descriptor to zero. Thus only trusted softwares has the right to perform task switching. Microprocessors and Microcontrollers 5-6 Multitasking 5.3.3 Task Register (TR) The Task Register (TR) specifies the currently executing task by pointing to the TSS. Fig. 53 shows the path by which Pentium processor accesses the current task. Task Register is a selector for the TSS. Task State ‘Segment 16 - Bit Visible Register Hidden Register Global Descriptor Table TR Fig. 5.3 Task register It has both visible portion which can be read and changed by instructions and invisible portion (maintained by the Pentium processor to correspond to the visible portion which can not be read by any instruction). The selector in the visible portion is used to specify a TSS descriptor in the GDT and invisible portion is used to cache the base and limit values from the TSS descriptor. Holding the base and limit in the invisible portion of the Task Register makes execution of the the task more efficient, because the processor does not need to repeatedly fetch these values from memory when it references the TSS of the current task. The Pentium processor gives two instructions to read and modify the visible portion of the task : LTR (Load Task Register) and STR (Store Task Register). LTR (Load Task Register) It loads the visible portion of the task register with the selector and invisible portion with information from the TSS descriptor selected by selector. LTR is a privileged instruction. Thus it is executed only when CPL is zero, Microprocessors and Microcontrollers ___5-7 Multitasking STR (Store Task Register) : It stores the visible portion of the task register in a general register or memory word. STR is not a privilege instruction. 5.3.4 Task Gates and Task Gate Descriptor Task gates, like call gates, are special system gates. It has its own descriptor. A task gate descriptor does not define a memory segment but instead acts as an interface point between user code and a task state segment. It provides an indirect and protected reference to a TSS. Fig, 5.4 shows the format of a task gate descriptor. A task gate descriptor defines a selector to a TSS descriptor which uniquely identifies a task. Like the selector to a call gate, the selector to a task gate can be used in place of a selector to a code segment in FAR JMP and FAR CALL instructions Fig. 5.4 Task gate descriptor ‘As mentioned earlier, the DPL field of a task gate controls the right to use the descriptor to cause a task switch. Procedure selects a task gate descriptor only when the maximum of selector’s RPL and the CPL of the procedure is numerically less than or equal to the DPL of the descriptor. MAX (CPL, RPL) < task gate DPL Now if DPL privilege level is 0. Then privilege constraint prevents untrusted procedures (procedures having privilege level from 1 to 3) from causing task switch. But through task gates we can switch from lower privilege to higher privilege because when a task gate is used, the DPL of the target TSS descriptor is not used for privilege checking. Thus a procedure that has access to a task gate has the power to cause a task switch. 5.4 Task Switching It is important to note that after every task switch i.e. after loading a new context from a TSS and updating TR, the Pentium processor marks the new TSS as “busy”. It does this by setting bit 41 in the currently running TSS descriptor, Therefore currently running task is always a busy task, The Pentium processor cannot do task switch into a task which is busy, Tasks are not reentrant, and task switches therefore cannot be recursive. Microprocessors and Microcontrollers 5-8 Multitasking The Pentium processor does task switching in any of four cases : 1 A long jump or call instruction contains a selector which refers to a TSS descriptor. This is the simplest method and can be easily implemented by the operating system kernel at the end of a time slice. 2. The selector in a long jump or call instruction refers to a task gate. In this case the selector for the destination TSS is in the task gate. This indirect method has advantages regarding privilege levels and protection. 3. The interrupt selector refers to a task gate in the interrupt descriptor table. The task gate contains the selector for the new TSS. If the access passes all the privilege level tests, the selector and descriptor for the interrupt task will be loaded into the task register. The nested task (NT) bit in the EFLAGs register will be set. 4. An IRET instruction is executed with the NT bit in the EFLAGs register set. The IRET instruction uses the back link selector in the TSS to return execution to the interrupted task. 1 Task Switching Without Task Gate Fig. 5.5 shows task switch operation. EFLAGS oR3 oetss Co ‘Task - Swtching Instruction cor Fig. 5.5 Task switch operation Microprocessors and Microcontrollers 5-9 Multitasking Steps Involved In Task Switching (Without Task Gate) 1. Privilege Check : The current task is checked to see whether it is allowed to switch to the designated task. This is done by checking DPL of the designated TSS with RPL and CPL of the current task. If the DPL of the TSS descriptor is numerically greater than or equal to the maximum of CPL and the RPL of the selector then only the current task is allowed to switch to the designated task. Limit and Present Bit Checking : The TSS descriptor for the’ designated task is checked for its limits and presence. Saving the State of the Current Task : The Pentium processor finds the base address of the current TSS cached in the task register. It copies the registers into the current TSS (EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, ES, CS, DS, SS, FS, GS, the flag register and EIP). The EIP field of the TSS points to the instruction after the one that caused the task switch. The selector for the current task is saved as a back link selector in the new task. Loading of Task Register : The visible portion of the task register is loaded with the selector of the designated task’s TSS descriptor. This sets the TS (Task switch) bit in the Machine Status Word (MSW). This TS bit is useful to systems software when a coprocessor is present. The TS bit signals that the context of the coprocessor may not correspond to the current Pentium processor task. The B Bit in the new task’s descriptor is marked busy. Then the corresponding, task state descriptor is read from the GDT and loaded into the task register cache (hidden portion of task register). Resuming Execution : Finally, Pentium processor starts execution of designated task, with the instruction pointed by the new contents of the code segment selector (CS) and instruction pointer (EIP). The old program environment is preserved by saving the selector for the old TSS as the back link selector in the new TSS. By executing a return instruction at the end of the new task, the back link selector for the old TSS is automatically reloaded into TR and then program execution resumes at the point where it left off in the old task. 5.4.2 Task Switching with Task Gate In this, the indirect method is used for task switching. Task switching is done by jumping to or calling a task gate. Fig. 5.6 shows task switching through a task gate. Steps Involved in Task Switching (Using Task Gate) 1 Privilege Check : When task gate is used, the DPL of the new TSS descriptor is not used for privilege checking. The DPL of the task gate is compared with the CPL and RPL of the gate selector. If the DPL of the task gate is numerically greater than or equal to the maximum of CPL and the RPL of the gate selector, the current task is allowed to switch to the designated/new task. The remaining steps are similar excepts that for loading selector for TSS descriptor into TR task gate is referred instead of CALL or JMP instruction. Microprocessors and Microcontrollers 5-10 Multitasking In case of exceptions, interrupts and IRETs regardless of the DPL of the new task gate or TSS descriptor, the current task is allowed to switch to the new task. Local Descriptor Table Interrupt Descriptor Table Tesk Gate i i Task Gate Global Descriptor Table Task Descriptor Task State ‘Segement Fig. 5.6 Task switching through task gate 5.4.3 Nested Tasks Nested tasks are analogous to nested subroutines. If task switch was caused by a FAR CALL instruction or by an exception, fault or trap, the new task is considered to be nested within the old task that invoked it. In any of these cases, when the task executes an IRET instruction, the Pentium processor automatically task-switches back to the task that invoked it. To do so, there is a mechanism of linking the tasks, which is equivalent of a call/return stack. The task linking mechanism consists of Back Link and NT (Nested Task) flag. The Back Link is used to keep a track of a previous task. By executing a IRET instruction at the end of the new task, the back link selector for the previous TSS is automatically loaded into task register. This activates the previous task and restores the prior program environment. The Pentium processor sets the NT (Nested Task) flag in the Microprocessors and Microcontrollers 5-11 Multitasking EFLAGS register, when one system task invokes another task. The Pentium processor uses NT as a flag so that it can tell whether the Back Link field in the current TSS is valid Should it encounter an IRET instruction ? This is the only means by which Pentium processor determines whether it should perform a task switch or a normal IRET. RET instruction does not ‘unnest’ tasks, even if they were nested by CALL instructions. Only IRET can ‘unnest’ tasks. Nested Task Switches = Nested tasks act like subroutines. = CALL instruction to task gate will nest tasks. «Interrupt or exception to task gate will nest tasks. = JMP instruction will not nest tasks. = New TSS gets old TSS selector in Back Link field. = New task gets nested task bit set in EFLAGS register. = New task must return to old task with IRET instruction, 5.5 1/0 Protection The Pentium processor supports two mechanisms for protecting 1/O ports in protected mode 1. The IOPL field in the EFLAG register defines the right to use I/O related instructions (I/O privilege level). 2. The I/O permission bit map of a Pentium processor TSS segment defines the right to use ports in the I/O address space. 5.5.1 1/0 Privilege Level In this mechanism, for execution of IN, INS, OUT, OUTS, CLI and STI instructions, the CPL of a procedure or task must be the same or a lower number than the number represented by the IOPL bits. (CPL 1/0 Bit-Map offset BitMap- offset | 00000000 0000 000 ‘0000000000000 ‘0000000000000 '0000000000000000 Tss Backtine |, Fig. 5.7 UO address bit map The I/O permission bit map is a bit vector. The size of the map and its location in the TSS (Task State Segment) are variable. The Pentium processor locates the I/O permission map. By means of the I/O map base field which is in the fixed portion of the TSS. Each bit in the map corresponds to an I/O port byte address. Thus 16-bit ports use 2-bits each and 32-bit ports use 4-bit each. To access I/O port the corresponding bit in the I/O bitmap must be 0. When program attempts to access a port, the Pentium processor first compares the CPL of the task with the IOPL. If the access passes the IOPL test and an 1/O bit map is compulsory, the Pentium processor checks the map bit corresponding to the addressed port. If corresponding bit is 0, access is granted.- Microprocessors and Microcontrollers 5-13 Multitasking dumb Example 5.1 : An Pentium processor system has 256 1/O ports with addresses from OOH to FFH. All these ports except 21H to 2FH are to be made accessible to a user at PL3. Show how the I/O permission bit map look like? Solution : i) I/O permission bit map : FR} Oo; o};o}o|o}o;/o}o}o}ojo}o}o}o]o]o]ro EF/o/o}o}oj}o}ojo}o}oj/o}]o}o}o}o]o}o]eo oF} o}ofo}ol/o}afojojo}o]o}o]o}o]o]o|}oo CF}o}o;/o}o};o}ofo}ojo}o]o}o}]o}o}o]o}co BF/O}0}/0/0;)0}/0};0;/0}0}0;0]0}o0] 0) 04} 0} BO AF} o}oj/o}o]jo}ojo}ojo}ojo}o}]o}o0]0] 0} a0 ew }o}lo}ofo}ojo}ofo}jo}o}o]o}jojo}o}o}so a}ojo}lojo}ojo}o}jojo}o]ojfojojo}o}o}eo wlo}lo}ojo}o}jo}ojojo}afojo 0 | 0 | 0 | 70 eF{/o}o}o}fojojfolojololajojoa o | 0} 0 | 60 sF{o}o}o}olofojofo}o{a}]o]o]ojojo}o|}so 4F;/o}o}o};o}o}o}o}/o}oja}ojo}ojojo}o}4o 3F}O}o0;/o};o0}o0;/o}o0;0}o0};0}0]o0]; 0] 0} 0 | oO] 30 awlafatrtatatafada wrFlo;olo}lo}o}lo}ojojojoalo}ojo}o e}e oO /ojo}ofo}lofjojofojojao}ojo}jojo Review Questions 1. Write whether multitasking and multi-user systems are same, justify your answer. 2. Explain the methods by which task switch is forced. 3. What is context switching ? 4, Explain the Task Gate Descriptor. Microprocessors and Microcontrollers 5-14 Multitasking 5. What is the purpose of Task Register ? 6. What is a Task State Seament (TSS) ? Give the format of TSS descriptor. How does it differ from gate descriptor ? 7. Explain how Pentium processor carries out task switching using on-chip dota structures and various registers, 8. What are the various dota structures anc! registers that support multitasking in Pentium processors ? 9. Write a short note on nested tasks. goa Virtual Mode 6.1 Introduction In multitasking system, it is necessary to switch back and forth between real and protected mode. Because in multitasking system, there is a mixture of tasks, some use segment-offset addressing (Real mode addressing) and some use descriptors (protected mode addressing). The 8086 virtual mode solves this problem. A Pentium operating in protected mode can easily switch to virtual 8086 mode to execute a time slice of an 8086 program and then easily switch back to protected mode to execute a time slice of protected mode task. The Pentium allows execution of one or more 8086, 8088, 80186 or 80188 programs in an Pentium protected mode environment, as different tasks in the virtual 8086 mode. In 8086 virtual mode, the Pentium treats the segment registers exactly the same way as it does in Real Mode. Therefore, the address range of a virtual 8086 mode task is 1Mbyte. The segment and offset registers together give the linear address instead of physical address. The physical address is generated from the linear address with the help of page translation. Thus the -physical address may be anywhere in the 4 gigabyte memory addressable by the Pentium. In 8086 virtual mode, the Pentium provides mechanism to. selectively trap.and manage Input/Output and interrupt activity. Using software it is possible to determine the Input/Output Privilege Level (IOPL) that selectively controls Input/Output transfer and it is also possible to use the input/output port permission map to selectively control access to Input/Output ports. . This chapter covers the following topics in concern with 8086 virtual mode. «= Entering and Leaving 8086 Virtual Mode = Registers and Instructions = Address calculations in 8086 Virtual Mode = Paging ‘in virtual 8086 mode = Protection and 1/O permission bitmap in a virtual 8086 (6-1) Microprocessors and Microcontrollers 6-2 Virtual Mode 6.2 Entering and Leaving 8086 Virtual Mode The Pentium enters or leaves 8086 virtual mode due to any of the three Teasons as shown in Fig. 6.1. Task Switch Interrupt, Exception V6 Monitor “eons [mer IRET (Protected (V86 Mode) Mode) (Protected Mode) Task Switch Task Switch Fig. 6.1 Entering and leaving an 8086 program 1. An interrupt that vectors to a task gate 2. An action of the schedule of the Pentium operating system. 3. An IRET when the NT (Nested Task) flag is set. 6.2.1 Entering 8086 Virtual Mode The Pentium can enter 8086 virtual mode by either of two means : 1. A task switch to ar Pentium task loads the image of EFLAGs from the new TSS. If the VM bit in EFLAGS register is set, the Pentium enters virtual 8086 mode to execute the new task. If the VM bit is not set, the Pentium executes the new task as a normal protected mode task. Note: If the TSS of the new task is an 80286 TSS, Pentium does not enter into, 8086 virtual mode because the 80286 TSS does not store the high-order word of EFLAGs, which contains the VM flag. 2. An IRET from a procedure that loads the EFLAGs image changes the VM bit if the Current Privilege Level (CPL) at the time of IRET is zero. If changed status of the VM bit is 1 then Pentium enters in 8086 virtual mode. 6.2.2 Leaving 8086 Virtual Mode The Pentium leaves the 8086 virtual mode when an interrupt or exception occurs. 1. A task switching from a 8086 virtual task to any other task caused by interrupt or exception loads EFLAGS from the TSS of the new task. If the new TSG is an Microprocessors and Microcontrollers 63 Virtual Mode Pentium TSS and the VM bit is zero, or if the TSS is an 80286 TSS, the Pentium clears the VM bit of EFLAGS. It then loads the segment registers as defined by the new TSS and begins executing the instructions of the new task according to Pentium protected mode description. 2. The interrupt or exception which vectors to a privilege-level zero procedure, stores the current setting of EFLAGS on the stack, then clears the VM bit. As VM bit is zero, the Pentium starts executing the instructions in its protected mode environment. 6.3 Registers and Instructions 6.3.1 Registers Virtual 8086 mode register set includes : 1. All the registers defined for the 8086 plus 2. The new registers introduced by the Pentium : FS, GS, debug registers, test registers and control registers. 6.3.2 Instructions In virtual mode, Pentium can execute normal 8086 instructions as well as new instructions introduced by 80186/80188, 80286 and Pentium as listed below. For execution of new instructions and new override prefixes use of FS and GS segment registers is allowed. Instructions can utilize 32-bit operands through the use of the operand size prefix. 1 New instructions introduced by 80186/80188 and 80286 = PUSH immediate data = PUSH ALL and POP ALL ( PUSH A and POP A ) = Multiply immediate data Shift and rotate by immediate count = String 1/0 = ENTER and LEAVE = BOUND 2._New instructions introduced by Pentium 2 LSS, LFS, LGS instructions = Long displacement conditional jumps = Single bit instruction = Bit scan = Byte set on condition = Double shift instruction Microprocessors and Microcontrollers 6-4 Virtual Mode = Move with sign/zero extension = Generalized multiply Note : To access these instructions only 8086 addressing modes can be used. 6.4 Address Generation in 8086 Virtual Mode In virtual 8086 mode, the contents of segment registers are not used as a selector to point the descriptor. But the segment: register contents are used to generate linear address with the help of offset. The linear address is generated by adding the contents of the appropriate segment register which are shifted left by 4 bit to an effective address/offset. Fig. 6.2 shows virtual 8086 mode address generation. 19 3 oO * 49 15 ° Offset 16 - Bit Effective Address | 20 19 ui Address XX XX XX XX XX XK XK XK X KX KX KX Fig. 6.2 Virtual 8086 mode address gonoration If there is a carry generated after addition of shifted segment register contents and effective address, unlike 8086, resulting 21 bit address is a linear address. An Pentium in virtual 8086 mode is allowed to generate linear addresses anywhere in the range 0 to 10FFEFH (one megabyte plus approximately 64 Kbytes) of the task’s linear address space. Virtual 8086 tasks generate 32-bit linear addresses. While an 8086 program can only utilize the lower order 21 bits of a linear address, the linear address can be mapped via page tables to any 32-bit physical address. Unlike 8086 and 80286, the Pentium can generate 32-bit effective address with the address size command prefix. This address should not exceed beyond 65535 to maintain compatibility with 80286 Real Mode; otherwise Pentium generate pseudo-protection faults (INT 12 OR INT 13 with no error code). 6.5 Paging in Virtual Mode Although Protected Mode memory segmentation is used while the Pentium is operating in Virtual 8086 mode, the paging portion of Pentium does work. The paging hardware allows the concurrent running of multiple Virtual Mode tasks, and provides protection and operating system isolation. It is not necessary to have paging hardware enabled to run Virtual Mode tasks, however paging is useful or necessary for any of the following reasons : Microprocessors and Microcontrollers 6-5 . Virtual Mode 1. The paging mechanism is needed in order to run multiple virtual mode tasks, as shown in Fig. 63 vue Task, vse Task, FFFFF vas Task, Linear Page Physical Address Tebles Addresses: Fig. 6.3 Multiple virtual 8086 tasks 2. It is used to relocate the address space of a virtual 8086 mode task to physical address space greater than one megabyte. 3. The paging mechanism allows the 20-bit linear address produced by a virtual 8086 program to be divided into up to 256 pages. Each one of the pages can be located anywhere within the maximum 4 Gbyte physical address space of the Pentium. 4, Since CR3 (the page directory base register) is loaded by a task switch, each virtual 8086 mode task can use a different mapping scheme to map pages to different physical locations. 5. Paging mechanism allows the sharing of the 8086 operating systeni code between multiple 8086 applications. 6.6 Protection and I/O Permission Bitmap ‘The virtual 8086 programs or tasks do not make any distinction between code space, data space, and stack space. There are no upper bound and lower bound on segment, the address is generated using segment and index registers. There is no such thing as a Microprocessors and Microcontrollers 6-6 Virtual Mode not-present segment or a privileged segment. All virtual 8036 mode programs execute at privilege level 3, the level of least privilege. These programs are limited to the first IMB of the linear address space. Whenever VM bit is set the processor is operating in virtual 8086 mode and effective CPL is 3. Thus, an attempt to execute a privileged instructions (instructions to be executed in privilege level 0) in virtual 8086 mode will cause an exception 13. We know that the Pentium has several IOPL sensitive instructions. Recall that IOPL is a 2bit field in EFLAGs that specifies the minimum privilege level required to execute certain I/O related instructions. Because a VM86 program has a fixed privilege level of 3, it is never able to alter the IOPL bits and so might be granted or denied I/O permission when its TSS is first created. In virtual 8086 mode, both the IOPL field and 1/O permission bit map are used, but both have very different functions than they do in protected mode. In virtual 8086 mode, IOPL controls the right to execute the following instructions only. = CLI- Clear interrupt enable flag. = STI- Set interrupt enable flag. = LOCK - Asserts Bus Lock Signal. = PUSHE - Push flags = POPE - Pop flags = INTn - Software interrupts = IRET - Interrupt return. Note that the actual I/O instructions such as IN, OUT, INS and OUTS are not controlled by IOPL. Instead, these four instructions are controlled solely by the VM86 task's I/O permission map ( if one is defined). If the bit corresponding to the I/O locations being accessed is clear, I/O access is permitted; otherwise, the I/O instruction causes a general protection fault. Review Questions Write down the steps to enter and leave the Virtual 8086 Mode. ”. List the instructions in Virtual 8086 Mode. Describe how physical address is obtained in Virtual 8086 Mode. Write a notes on 4. Virtual 8086 mode. b. Muttiple virtual 8086 mode tasks. cc. Input/Output in virtual mode. d. Paging in virtual 8086 mode e. Protection and 1/0 permission bit map in VM86 mode. Qo0o0 Interrupts, Exceptions and I/O 7.1 Introduction Sometimes it is necessary to have the computer automatically execute one of a collection of special routines whenever certain conditions exist within a program or the microcomputer system. e.g, it is necessary that microcomputer system should give response to devices such as keyboard, sensor and other components when they request for service. The most common method of servicing such device is the Polled approach. This is where the processor must test each device in sequence and in effect “ask” each one if it needs communication with the processor. It is easy to see that a large portion of the main program is looping through this continuous polling cycle. Such a method would have a serious and detrimental effect on system throughout, thus limiting the tasks that could be assumed by the microcomputer and reducing the .cost effectiveness of using such devices. A more desirable method would bé the one that allows the microprocessor to execute its main program and only stop to service peripheral devices when it is told to do so by the device itself. In effect, the method would provide an external asynchronous input that would inform the processor that it should complete whatever instruction that is currently being executed and fetch a new routine that will service the requesting device. Once this servicing is completed, the processor would resume exactly where it left off. This method is called interrupt method. It is easy to see that system throughput would drastically increase, and thus enhance its cost effectiveness. Most microprocessors allow execution of special routines by interrupting normal program execution. When a microprocessor is interrupted, it stops executing its current program and calls a special routine which “services” the interrupt. The event that causes the interruption is called interrupt and the special routine which is executed is called interrupt service routine/procedure. An interrupt causes the microprocessor to temporarily suspend execution of the current program and forces it to jump to another program, ISR. At the completion of the ISR, the microprocessor must then return to the original program flow at the point it was interrupted. Normal program can be interrupted by three ways : 1. By external signal 2. By a special instruction in the program or 3. By the occurrence of some condition. (7-1) Microprocessors and Microcontrollers 7-2 Interrupts, Exceptions and /O To handle such interruptions Pentium provides two special kind of control transfers : Interrupts and Exceptions. Exceptions differ from the interrupts. The interrupts are used to handle asynchronous events external to the processor where as exceptions handle conditions detected by the processor itself in the course of executing instructions. 1, Interrupts (Hardware) = Maskable interrupts, which are routed via the INTR pin. = Nonmaskable interrupts, which are routed via the NMI (Non-Maskable Interrupt) pin. Hardware interrupts occur as the result of an external event. These interrupts are serviced after the execution of the current instruction. After the interrupt handler. is finished servicing the interrupt, execution proceeds with the instruction immediately after the interrupted instruction. 2. Exceptions Processor detected. They are further classified as faults, traps, or aborts depending on the way they are reported, and whether or not restart of the instruction causing the exception is supported. Faults : Faults are the exceptions that are detected and serviced before the execution of the faulting instruction. For example, in virtual memory system if page or segment referenced by processor is not present, the operating fetches the page or segment from disk using fault exception routine, and then Pentium restarts processing using referenced page or segment - Traps : Traps are exceptions that are reported immediately after the execution of the instructions which causes the problem. User defined interrupts are the examples of traps. Aborts : Aborts are exceptions which do not permit the precise location of the instruction causing the exception to be determined. Aborts are used to report severe errors, such as hardware error, or illegal values in the system tables. Programmed/ Software Interrupts : The instructions INTO, INT3, INTn and BOUND can trigger exceptions. These instructions are often called “Software Interrupts’, but the Pentium handles them as exceptions. Microprocessors and Microcontrollers 73 Interrupts, Exceptions and /O0. ns in Pentium 7.2 Interrupts and Exception Condi Interrupt 0 : Divide Error When the quotient from either a DIV or IDIV instruction is too large to fit in the result register; Pentium will automatically triggers type 0 interrupt. Interrupt 1 : Debug Exceptions The Pentium triggers this interrupt for one of the conditions; whether the exception is a fault or a trap depends on the condition : = Instruction address breakpoint fault. = Data address breakpoint trap. = General detect fault. = Single-step trap. = Task-switch breakpoint trap. The Pentium does not push an error code for this exception. An exception handler can examine the debug registers to determine which condition caused the exception. Interrupt 2 ; Non Maskable Interrupt As the name suggests, this interrupt can not be disabled by any software instruction. This interrupt is activated by low to high transition on Pentium NMI input pin. In response, Pentium. triggers a type 2 interrupt. Interrupt 3 : Breakpoint The type 3 interrupt is used to implement BREAK POINT function in the system. The type 3 interrupt is produced by execution of the INT 3 instruction. Break point function is often used as a debugging aid in cases where single stepping provides more detail than wanted. When you insert a breakpoint, the system executes the instructions upto the breakpoint, and then goes to the breakpoint procedure. In the break point procedure you can write a program to display register contents, memory contents and other information that is required to debug your program. You can insert as many breakpoints as you want in your program. Interrupt 4; Overflow Interrupt The type 4 interrupt is used to check overflow condition after any signed arithmetic operation in the system. The Pentium overflow flag, OF, will be represented in the destination register or memory location. For example, if you add the S-bit signed number 0111 1000 (+ 120 decimal) and the 8 bit signed number 0110 1010 (+ 106 decimal), result is 1110 0010 (- 98 decimal). In signed numbers, MSB (Most Significant Bit) is reserved for sign and other bits represent magnitude of the number. In the previous example, after addition of two 8-bit signed numbers result is negative, since it is too large to fit in 7 bits. To detect this condition in Microprocessors and Microcontrollers 14 Interrupts, Exceptions and /O the program, you can put interrupt on overflow instruction, INTO, immediately after the arithmetic instruction, the instruction will simply function as NOP (no operation). However, if the overflow flag is set, indicating an overflow error, the Pentium will trigger a type 4 interrupt after executing the INTO instruction. Another way to detect and respond to an overflow error in a program is to put the jump if overflow instruction (0) immediately after the arithmetic instruction. If the overflow flag is set as a result of arithmetic operation, execution will jump to the address specified in the JO instruction. At this address you can put an error routine which responds in the way you want to the overflow. Interrupt 5 : Bounds Check The Pentium triggers interrupt 5 if it notices that the operand has crossed the limits specified by the previously executed BOUND instruction. Interrupt 6 : Invalid Opcode This fault occurs when an invalid opcode is detected by the execution unit. (The exception is not detected until an attempt is made to execute the invalid opcode; i.e., prefetching an invalid opcode does not cause this exception). No error code is pushed on the stack. The exception can be handled within the same task. This exception also occurs when the type of operand is invalid for the given opcode. Examples include an intersegment JMP referencing a register operand, or an LES instruction with a register source operand. Interrupt 7 : Coprocessor Not Available This exception occurs in either of two conditions : = The Pentium encounters an ESC (escape) instruction, and the EM (emulate) bit of CRO (control register zero) is set. ® The Pentium encounters either the WAIT instruction or an ESC instruction, and both the MP (monitor coprocessor) and TS (task switched) bits of CRO are set. Interrupt 8 : Double Fault Normally, when the Pentium detects an exception while trying to invoke the handler for a prior exception, the two exceptions can be handled serially. If, however, the Pentium cannot handle them serially, it signals the double-fault exception instead. To determine when two faults are to be signalled as a double fault, the Pentium divides the exceptions into three classes ; benign exceptions, contributory exceptions, and page faults. Table 7.1 shows this classification. It also shows which combinations of exceptions cause a double fault and which do not. Microprocessors and Microcontrollers 75 Interrupts, Exceptions and /0. Class 1D Description 1 Debug exceptions 2 | NM 3 | Breakpoint roti 5 | Bounds check i v Ptions | | invalid opcode 7 | Coprocessor not available 416 __|_Coprocessor error 0 | Divide error 9 | Coprocessor segment overrun Contributory | 10 | Invaid TSs Exceptions | 11 | Segment not present 412 | Stack exception 13 __| General protection Page Faults | 14 | Page faut Table 7.1 Double-fault detection classes SECOND EXCEPTION Benign Contributory | Page Faut Exception Exception Benign OK OK OK Exception FIRST Contributory OK DOUBLE OK EXCEPTION | — Exception Page Fault OK DOUBLE DOUBLE Table 7.2 Double-fault definition Interrupt 9 : Reserved by Intel Interrupt 10 : Invalid TSS Interrupt -10 occurs if during a task switch the new TSS is invalid. considered invalid in the cases shown in Table 73. An error code is pushed onto the stack to help identify the cause of the fault. The EXT bit indicates whether the exception was caused by a condition outside the control of the program; e.g., an external interrupt via a task gate triggered a switch to an invalid TSS, A TSS is Error Code Condition TSS id + EXT The limit in the TSS descriptor is less than 103 LTD id + EXT Invalid LOT selector or LOT not present SS id + EXT ‘Stack segment selector is outside table limit SS \d + EXT ‘Stack segment is not a writeable segment SS id + EXT | Stack segment DPL does not maich new CPL 88 id + EXT Stack segment selector RPL < > CPL CS id + EXT Code segment selector is outside table limit Microprocessors and Microcontrollers __7-6 Interrupts, Exceptions and /O CS id + EXT Code segment selector does not refer to code segment CS id + EXT DPL of non-conforming code segment < > new CPL CS id + EXT DPL of conforming code segment > new CPL DS/ESIFS/GSid + EXT _|_DS, ES, FS, or GS segment selector is outside table limits DS/ESIFSIGS DS, ES, FS, or GS is not readable segment id + EXT Table 7.3 Conditions that invalidate the TSS Interrupt 11 : Segment Not Present Exception 11 occurs when the Pentium detects that the present bit of a descriptor is zero. The Pentium triggers this fault in any of these cases : While attempting to load the CS, DS, ES, FS, or GS registers; loading the SS register however, causes a stack fault. While attempting loading the LDT register with an LLDT instruction; loading the LDT register during a task switch operation, however, causes the “invalid TSS” exception. While attempting to use a gate descriptor that is marked not-present. This fault is restartable. If the exception handler makes the segment present and returns, the interrupted program will resume execution. Interrupt 12 : Stack Exception A stack fault occurs in either of two general conditions : As a result of a limit violation in any operation that refers to the SS register. This includes stack-oriented instructions such as POP, PUSH, ENTER, and LEAVE, as well as other memory references that implicitly use SS (for example, MOV AX, [BP + 8]). ENTER causes this exception when the stack is too small for the indicated local- variable space. When attempting to load the SS register with a descriptor that is marked not-present but is otherwise valid. This can’ occur in a task switch, an interlevel CALL, an interlevel return, an LSS instruction, or a MOV or POP instruction to SS. When the Pentium detects a stack exception, it pushes an error code onto the stack of the exception handler. If the exception is due to a not-present stack segment or to overflow of the new stack during an interlevel CALL, the error code contains a selector to the segment in question (the exception handler can test the present bit in the descriptor to determine which exception occurred); otherwise the error code is zero. Microprocessors and Microcontrollers___7-7 Interrupts, Exceptions and /O Interrupt 13 : Géneral Protection Exception All protection violations that do not cause another exception cause a general protection exception. This includes 1. Exceeding segment limit when using CS, DS, ES, FS, or GS Exceeding segment limit when referencing a descriptor table Transferring control to a segment that is not executable Writing into a read-only data segment or into a code segment Reading from an execute-only segment Loading the SS register with a read-only descriptor (unless the selector comes from the TSS during a task switch, in which case a TSS exception occurs). x Loading SS, DS, ES, FS, or GS with the descriptor of a system segment 8. Loading DS, ES, FS, or GS with the descriptor of an executable segment that is not also readable. 9. Loading $S with the descriptor of an executable segment 10. Accessing memory via DS, ES, FS, or GS when the segment register contains a null selector 11. Switching to a busy task 12, Violating privilege rules 13. Loading CRO with PG = 1 and PE = 0. 14. Interrupt or exception via trap or interrupt gate from V86 mode to privilege level other than zero. 15. Exceeding the instruction length limit of 15 bytes (this can occur only if redundant prefixes are placed before an instruction) The general protection is a fault. In response to a general protection exception, the Pentium pushes an error code onto the exception handler’s stack. If loading a descriptor causes the exception, the error code contains a selector to the descriptor; otherwise, the error code is null. Interrupt 14 : Page Fault This exception occurs when paging is enabled ( PG = 1) and the Pentium detects one of the following conditions while translating a linear address to a physical address : = The page-directory or page-table entry needed for the address translation has zero in its present bit. = The current procedure does not have sufficient privilege to access the indicated page. Microprocessors and Microcontrollers 7-8 Interrupt 15 : Reserved by Intel. Interrupt 16 : Coprocessor Error. The Pentium reports this exception when it detects a signal from the 80287 or 80387 on the Pentium’s ERROR input pin. The Pentium tests this pin only at the beginning of certain ESC instructions and when it encounters a WAIT instruction while the EM bit of the MSW is zero (no emulation). Interrupts, Exceptions and /O Table 7.4 shows the interrupt and exception summary Return Address i Interrupt Function that can Generate Description Number, {Points to Faulting) Type the Exception Instruction Divide error ° YES FAULT Diy, IDV Debug excepti 1 YES TRAP Any Instruction NMI 2 NO NMI INT 2 or NMI Break point 3 NO TRAP ‘One-byte INT 3. Overflow 4 NO TRAP INTO Bounds check 5 YES FAULT BOUND Invalid opcode 6 YES FAULT Any illegal instruction Coprocessor not available 7 YES FAULT ESC, WAIT Double fault 8 YES ‘ABORT ‘Any jinsiruction that can! generate an exception Coprocessor Segment! 9 NO ‘ABORT ‘Any operand of an ESC Overrun instruction that wraps| around the end of a ‘segment. Invalid TSS 10 YES FAULT JMP, CALL, IRET, any| interrupt ‘Segment not present 1 YES FAULT Any segment-register| modifier ‘Stack exception 12 YES FAULT ‘Any memory reference through SS General Protection 13 YES FAULTABOR | Any memory reference or T code fetch Page fault 14 YES FAULT ‘Any memory reference or code fetch Reserved by Intel 15 = _ = Coprocessor error 16 YES FAULT ESC, WAIT Two-byte SW Interrupt 0-255 NO TRAP, INTA Table 7.4 Interrupt and exception summary Microprocessors and Microcontrollers 79 Interrupts, Exceptions and 1/0 7.3 Enabling and Disabling Interrupts +Certain conditions and flag settings cause the Pentium to inhibit/mask certain interrupts and exceptions at instruction boundaries. These conditions and settings are : 7.3.1 NMI Masks Further NMis While an NMI handler is executing, the Pentium ignores further interrupt signals at the NMI pin until the next IRET instruction is executed. 7.3.2 IF Masks INTR The IF (interrupt-enable flag) controls the acceptance of external interrupts routed via the INTR pin. When IF = 0, INTR interrupts are masked; when IF = 1, INTR interrupts are enabled. In response to RESET IF flag is cleared. It can be set or reset by STI and CLI instructions, respectively. These instructions may be executed only if CPLsIOPL A protection exception occurs if they are executed when CPL > IOPL. 7.3.3 RF Masks Debug Faults The RF bit in EFLAGs controls the recognition of debug faults. This allows debug faults to be raised for a given instruction at most once, no matter how many times the instruction is restarted. 7.4 Priority Among Simultaneous Interrupts and Exceptions If more than one interrupt or exception is pending at an instruction boundary, the Pentium services one of them at a time. The priority among types of interrupt and exception sources is shown in Table 7.5. The Pentium first services a pending interrupt or exception from the type that has the highest priority, transferring control to the first instruction of the interrupt handler. Lower priority exceptions are discarded; lower priority interrupts are held pending. Discarded exceptions will be rediscovered when the interrupt handler returns control to the point of interruption. Priority Types of Interrupt or Exception HIGHST Faults except debug faults. Trap instructions INTO, INTa, INT3. Debug traps for this instruction. Debug faults for next instruction NMI interrupt Lowest INTR interrupt Table 7.5 7.5 Handling Interrupts and Exceptions in Real Mode The Pentium supports Real Mode interrupts and exceptions much like the 8086. In Pentium, addresses from 0 through 3FFH (400H memory locations) are dedicated for Interrupt Descriptor Table (IDT) after Reset. This table contains pointers that define the starting point of the interrupt service routines. Each pointer in the table requires four bytes of memory. Thus, it contains upto 256 (4% 256 = 1024 = 400H ) interrupt pointers. Four bytes in each pointer represent two words. The word having higher memory address holds Microprocessors and Microcontrollers 7-10 Interrupts, Exceptions and /0 the segment base address, whereas the word having lower memory address holds offset. Fig. 7.1 shows the Interrupt Descriptor Table (IDI). Like 8086, interrupts are recognized by their numbers/types. Each time when interrupt occurs, Pentium multiplies interrupt number/type by four to generate an index into the interrupt descriptor table word 1 word 2 Gate for interrupt # Gate for interrupt # nt ce Gate for \ interrupt # 1 Increasing memory address Gate for interrupt # 0 [er] TOTR end > Fig. 7.1 Interrupt descriptor table In Pentium, the Interrupt Descriptor Table is relocatable. The base address of interrupt descriptor table is present in the IDTR (Interrupt Descriptor Table Register ). The programmer can change this address by loading different address in the IDTR. This is possible using LIDT instruction. The LIDT instruction allows the relocation of base address and it also used to specify the size of the IDT. If an interrupt occurs and the corresponding entry in the interrupt table is beyond the limit stored in IDTR, a general protection fault (exception’8) will occur. Table 7.6 summarises Pentium Real Address Mode exceptions. Interrupt Cause of Exception Description Number 0 DIV, IDIV Divide error 1 All Debug exceptions 3 INT Breakpoint 4 INTO Overfiow 5 BOUND Bounds check 6 Any undefined opcode or LOCK used | Invalid opcode with wrong instruction ESC or WAIT Coprocessor not available Microprocessors and Microcontrollers 7-11 Interrupts, Exceptions and /O 8 INT vector is not within IOTR limit Interrupt table limit too small ott Reserved 2 Memory operand crosses offset 0 or | Stack fault OFFFFH 8 Memory operand crosses offset | Pseudo-protection exception OFFFFH or attempt to execute past offset OFFFFH or instruction longer than 15 bytos 14,15 Reserved 16 ESC or WAIT Coprocessor error 0-255 INTa : Two-byte software interrupt Lo Table 7.6 Pentium real-address mode exceptions Note 1: Some debug exceptions point to the faulting instruction, others to the next instruction. By examining the contents of DR6, it is possible to determine whether the debug iS pointing to the faulting instruction or to the next instruction. . . Note 2: The coprocessor errors are reported on the first ESC or WAIT instruction after the ESC instruction that caused the error. 7.6 Handling Interrupts and Exceptions in Protected Mode In protection mode, each interrupt or exception is associated with a descriptor which gives the information about interrupt service routine. These descriptors are stored in a special descriptor table called the interrupt descriptor table or IDT. This table can be located anywhere in memory. Like the GDT and LDTs, the IDT is an array of 8 byte descriptors. The base address and limit for the interrupt descriptors table are loaded into the interrupt descriptor table register (IDTR) as shown in Fig. 7.2. ( See Fig. on next page) Because there ‘are only 256 identifiers,.the IDT need not contain more than 256 descriptors. There are three types of descriptors can be used in the IDT. a Trap gate descriptor = Interrupt gate descriptor a Task gate descriptor If any other type of descriptor is found in the IDT when an exception occurs, the Pentium generates a general protection fault. TASK cind INTERRUPT Gates Task gate descriptors are discussed earlier. Trap gate and interrupt gate descriptors are introduced in this section. These descriptors contain pointer to segment descriptors. These are similar to call gates. The most noticeable difference is the absence of a word count field for passing parameters to the stack. Microprocessors and Microcontrollers 7-12 Interrupts, Exceptions and /O Gate INT 255 Descriptor (User defined) nf Gate INT4. Descriptor (Over flow) Gate INT3. Descriptor (Ereakpoint) Gate INT2 Descriptor (Non maskable interrupt) INTA (Debug exceptions) Gate Descriptor INTO (Divide error) 2 DT . Fig. 7.2 Interrupt descriptor table and IDTR Pentium TASK GATE 31 23 16 7 (NOT USE) P] vp. Jo0101) (NoTUSE) SELECTOR (Nor USE) Pentium INTERRUPT GATE 24 23 15 7 ° OFFSET 31, P}opt}o1110/0 0 of (NOT), Use) SELECTOR OFFSET 16 Pentium TRAP GATE 31 23 15 7 ° (nor, OFFSET 31.16 Plop.}ortit joo of 02], SELECTOR OFFSET 15 .. Fig. 7.3 Pentium IDT gate descriptors Microprocessors and Microcontrollers 7AZ Interrupts, Exceptions and /O When an interrupt or exception occurs, its identifier, a number is multiplied by 8 and added to the IDT base address stored in the IDTR. The result is a pointer to gate descriptor in the interrupt descriptor table. The gate descriptor can be any of three types : Interrupt gate, a trap gate, or a task gate. The 32-bit base address from gate descriptor and the 32-bit offset from gate are added to generate the linear address for the actual interrupt procedure as shown in the Fig 7.4. EXECUTABLE ‘SEGMENT OFFSET | ENTRY POINT INTERRUPT | TRAP GATE OR ID —>| INTERRUPT GATE LOT OR GDT DESCRIPTOR INTERRUPT DESCRIPTOR ‘TABLE(IDT) Fig. 7.4 Interrupt vectoring for procedures, Then the contents of EFLAGs and information needed for returning to the original procedure are stored in the stack and control is transferred to interrupt or exception handling procedure. Before Pentium passes control, however, pushes at least 12 bytes onto the ISR’s stack. Fig, 7.5 shows the information that is stacked before control is transferred to interrupt or exception handling procedure. As shown in Fig. 7.5 if an exception cause a special error code it is saved on the stack. If exception requires a privilege change then old stack segment and pointer are also saved on the stack. Microprocessors and Microcontrollers 7-14 Interrupts, Exceptions and /O EFLAGS cs EIP "No Privilege EIP change Woes Nopratee change L Error code Privilege change No error code Privilege change Error code Fig. 7.5 Information stored onto the stack Trap Gate Vs Interrupt Gate Trap gate operates exactly like an interrupt gate in all respect except one. When an exception vectors through a trap gate all flags remain exactly as they were when the exception occurred (No change in flags status). When an exception vectors through an interrupt gate, the Pentium resets IF (Interrupt Flag) to disable further hardware interrupts, after it pushes the return address and EFLAGs but before it executes first instruction of the ISR. 79 Returning from an Interrupt Procedure ‘An interrupt procedure slightly differs from the normal procedure, as its method of leaving the procedure is different from normal procedure. While returning from the interrupt it is necessary to read EFLAGS from the stack, Thus the IRET instruction is used to exit from an interrupt procedure instead of RET instruction. The IRET instruction increments EIP by an extra four bytes and loads the saved flags into the EFLAGs register. The IRET instruction then loads CS, and EIP pointers to point previous procedure from where it is interrupted. In case privilege change it also loads old stack segment and pointer. Processing Interrupt Service Routines = IDT stores the descriptors for interrupt service routines = Only trap gate, interrupt gate and task gate descriptors are allowed. = Operate like programmed procedures/subroutines = Before transferring saves all register that are used = ISRs are invoked by interrupts or exceptions instead of CALL instructions = ISRs terminate with IRET instead of RET instructions. Microprocessors and Microcontrollers 7-15 Interrupts, Exceptions and /O Privilege levels Like call gates, interrupt and trap gates have privilege levels associated with them. The DPL field of a trap, task or interrupt gate determines the minimum privilege level required to pass through the gate. The CPL must be equal or higher privileged than the gate’s DPL. It is recommended that DPL field always be kept at privilege level 3. Due to this any privilege level program can handle exceptions. The another condition must be satisfied to handle the exception is that the exception code’s DPL must have equal or less privilege level than the CPL. Exception handler privilege levels = Exception gate’s DPL must be less privilege than CPL = Exception codes DPL must be equal or less privilege than CPL Task gates The third alternative for an exception gate is a task gate. When an exception identifier selects a task gate, the Pentium performs an immediate task switch. The task activated is determined by the TSS selector stored in the task gate descriptor. The TSS selector of the current task, which is now dormant, is copied into the Back Link field of the new task. The new task will have its NT (nested task) bit set in EFLAGs. When the exception handling task completes and executes an IRET instruction, the Pentium activates the interrupted task based on the back link information. Advantage of using task gate over trap and interrupt gates : = The entire context of the interrupted task is saved automatically. = The exception handler does not need to be concerned with contaminating the interrupted code. = The exception handler can run at any privilege level. = The exception handler can use it own private code and data space because it can have its own LDT. Drawbacks of using task gate over trap and interrupt gates : = More time is required to perform task switch = A task gate cannot specify where in the task to begin execution. Dormant tasks always resume where they left off. = It is difficult to retrieve any information about the interrupted code when it is in a different task. Microprocessors and Microcontrollers __7-16 Interrupts, Exceptions and 7.8 Interrupts and Exceptions in Virtual 8086 Mode Hardware interrupts, software exceptions, and processor aborts, traps and faults are handled differently in virtual 8086 mode than they are in protected mode. The Pentium. operating system determines if the interrupt comes from a protected mode application or from a virtual 8086 mode program by examining the VM bit in the EFLAGS. In virtual 8086 mode exception process either use 8086 style ISR or protected mode ISR. 7.8.1 Protected Mode ISR When an interrupt or exception occurs in virtual 8086 mode, Pentium first switches from virtual 8086 mode to protected mode. Then it locates the current task’s TSS (pointed by TR) and reads the privilege level 0 stack selector and stack pointer. It pushes the current CS, EIP (32-bit) and EFLAGs (32-bit) onto this stack. This operation is similar to 8086 interrupt processing operation. It also pushes SS and ESP onto the stack. If exception generates error code then the error code is also saved onto the stack. It also saves all four data segment registers (DS, ES, FS and GS) and loads zero into these segment registers before starting to execute the handler procedure. Fig. 7.6 shows protected mode privilege level 0 stack and virtual 8086 program stack after recognition of the exception but just before the beginning of the ISR execution. Protected Mode Virtual 8086 Privilege Level 0 Stack Program Stack 7m Gs Sss:sP 2 FS 7 Ds ? ES 7 ss ESP EFLAGS 7 Gs eIP |+-— ss:€SP (no enor code) Error Code (optional) f+ — SS:ESP (with error code} 4 A Fig. 7.6 Protected mode privilege level 0 stack and virtual 8086 program stack Microprocessors and Microcontrollers _7-17 Interrupts, Exceptions and /O All four data segments are loaded with zeroes because there is difference between virtual 8086 memory segmentation and protected mode segmentation, In virtual mode, segment register gives the segment’s base address whereas in protected mode the contents of segment register gives 13-bit selector to segment descriptor, one bit to select a descriptor table and two more bits to determine or affect privilege level protection. If ISR begins with the virtual 8086 mode values, it is always possible to make wrong memory accesses. Therefore, it is necessary to flush all six segment registers during the transition, from virtual 8086 mode to protected mode. Virtual 8086 Mode Exception Processing = The Pentium automatically changes from Virtual 8086 to Protected mode. = The Protected mode PLO stack is used at all times. = The Virtual 8086 stack remains unused. = The 32-bit register contents are saved. «= The four data segment registers are saved. For returning from ISR, Pentium executes an IRET instruction. The Pentium itself checks VM bit when it encounters an IRET instruction. If it finds VM bit set as it pops EFLAGs from the stack, it restores the four 8086-style segment registers before returning to a Virtual 8086 Mode program. Terminating Virtual 8086 Mode Exception Processing = The interrupt service routine executes an IRET instruction. = The Pentium removes information from protected PLO stack. = The VM bit in EFLAGS identifies the Virtual 8086 mode caller. = The Pentium restores four data segment registers. 7.8.2 8086 Style ISR When interrupt or exception occurs in virtual 8086 mode it is possible to use 8086 style ISRs and access interrupt vector table. However, the Pentium does not allow to use this table directly. Whenever exception occurs, Pentium does the task switch and sets NT bit. Unfortunately, this NT bit is checked only when IRET is executed in Protected Mode. NT is ignored in Virtual 8086 Mode. Thus virtual 8086 task cannot return to its parent task with an IRET instruction. To avoid this, it is necessary to use either a protected mode task or protected mode procedure to handle all virtual 8086 mode exceptions. The following steps are involved to handle 8086 style ISR. 1. When interrupt or exception occurs in virtual 8086 mode, control is given to the PLO Protected Mode ISR and the virtual 8086 mode programs status is pushed onto the top of the ISR’s stack. Microprocessors and Microcontrollers 7-18 Interrupts, Exceptions and /0 2. The Pentium, copies IP, CS, and FLAGs (only 16-bits) from the ISR’s stack onto the virtual 8086 mode program's stack, and modifies the virtual 8086 mode program's stack pointer. It stores all of the information on the ISR’s stack. It pushes a FLAGs register (32-bit) with bit 17 (VM) set and bits 12 and 13 (IOPL) cleared. It also pushes an 8086 style CS segment and EIP. 5. It then executes IRET instruction and terminates the protected mode ISR. Due to this program control is returned to the 8086 exception handler and not to the interrupted program. 6. It executes 8086 ISR and finally IRET instruction to generate a general protection fault. 7. The general protection handler then transfers control to the first ISR (Protected mode ISR). 8. Then Pentium reads the IP, CS and FLAGS from the virtual 8086 mode program's stack and adjusts SP accordingly. 9. It then reads information stored in step 3 from the ISR’s stack and executes another IRET instruction. This terminates protected mode ISR and gets the return address and EFLAGs register status. 10. At the end, the Pentium resumes the execution of interrupted program. Using 8086 Interrupt Service Routines = The 8086 cannot directly utilize 8086 ISR code. = The exception handler must run in Protected mode. = The exception handler can “reflect” the interrupt to an 8086 ISR. = The 8086 ISR must return to the Protected mode exception handler. = The exception handler returns to the interrupted code. 7.9 1/O Handling In Pentium In addition to transferring data to and from external memory, Pentium processors can also transfer data to and from input/output ports (I/O ports). I/O ports are created in system hardware by circuity that decodes the control, data, and address pins on the processor. These I/O ports are then configured to communicate with peripheral devices. An I/O port can be an input port, an output port, or a bidirectional port. Some I/O ports are used for transmitting data, such as to and from the transmit and receive registers, respectively, of a serial interface device. Other I/O ports are used to control peripheral devices, such as the control registers of a disk controller. Microprocessors and Microcontrollers 7-19 Interrupts, Exceptions and 1/0 This chapter describes the processor's I/O architecture. The topics discussed include: = I/O port addressing « I/O instructions = 1/O protection mechanism 7.9.1 /O Port Addressing The Pentium processor permits applications to access I/O ports in either of two ways: © Through a separate I/O address space © Through memory-mapped I/O. Accessing 1/0 ports through the I/O address space is handled through a set of I/O instructions and a ‘special I/O protection mechanism. Accessing I/O ports through memory-mapped I/O is handled with the processors general-purpose move and string instructions, with protection provided through segmentation or paging. I/O ports can be mapped so that they appear in the I/O address space or the physical-memory address space (memory mapped I/O) or both. One benefit of using the I/O address space is that writes to I/O ports are guaranteed to be completed before the next instruction in the instruction stream is executed. Thus, 1/O writes to control system hardware cause the hardware to be set to its new state before any other instructions are executed. 7.9.2 1/O Port Hardware From a hardware point of view, I/O addressing is handled through the processor's address lines. In Pentium processors , the M/TO pin indicates a memory address (1) or an 1/O address (0). When the separate I/O address space is selected, it is the responsibility of the hardware to decode the memory-I/O bus transaction to select I/O ports rather than memory. Data is transmitted between the processor and an I/O device through the data lines. 7.9.3 1/O Address Space ‘The processor's 1/O address space is separate and distinct from the physical-memory address space. The I/O address space consists of 2'° (64K) individually addressable 8-bit 1/O ports, numbered 0 through FEFFH. I/O port addresses OF8H through OFFH are reserved. We should not assign I/O ports to these addresses. Any two consecutive 8-bit ports can be treated as a 16-bit port, and any four consecutive ports can be a 32-bit port. In this manner, the processor can transfer 8, 16, or 32 bits to or from a device in the 1/O address space. Like words in memory, 16-bit ports should be aligned to even addresses (0, 2, 4, ...) so that all 16 bits can be transferred in a single bus cycle. Likewise, 32-bit ports should be aligned to addresses that are multiples of four (0, 4, 8, ...). The Pentium processor supports data transfers to unaligned ports, but there is a performance penalty because one or more extra bus cycle is required to complete the data transfer. If hardware Microprocessors and Microcontrollers __7-20 Interrupts, Exceptions and VO or software requires that I/O ports be written to in a particular order, that order must be specified explicitly. For example, to load a word-length I/O port at address 2H and then another word port at 4H, two word-length writes must be used, rather than a single doubleword write at 2H. Note that the processor does not mask parity errors for bus cycles to the 1/0 address space. Accessing I/O ports through the I/O address space is thus a possible source of parity errors. 7.9.4 Memory-Mapped 1/0 I/O devices that respond like memory components can be accessed through the Pentium processor's physical-memory address space as shown in Fig. 7.7. When using memory-mapped 1/O, any of the Pentium processor's instructions that reference memory can be used to access an I/O port located at a physical-memory address. For example, the MOV instruction can transfer data betweén any register and a memory-mapped I/O port. The AND, OR, and TEST instructions may be used to manipulate bits in the control and status registers of a memory-mapped peripheral devices. Physical memory FFFF FFFFH Fig. 7.7 Memory-mapped 0 When tsing memory-mapped I/O, caching of the address space mapped for 1/O operations must be prevented. The Pentium provides the KEN pin, which when held inactive (high) prevents caching of all addresses sent out on the system bus. To use this pin, external address decoding logic is required to block caching in specific address spaces. The Pentium processor also provide the PCD (page-level cache disable) flag in page table and page directory entries. This flag allows caching to be disabled on a page-by-page basis. 7.9.5 1/O Instructions The processor's I/O instructions provide access to I/O ports through the I/O address space. (These instructions cannot be used to access memory-mapped 1/O ports.) There are two groups of I/O instructions : Microprocessors and Microcontrollers 7-21 Interrupts, Exceptions and /O = Those that transfer a single item (byte, word, or doubleword) between an I/O port and a general-purpose register. = Those that transfer strings of items (strings of bytes, words, or doublewords) between an I/O port and memory, The register I/O instructions IN (input from I/O port) and OUT (output to 1/0 port) move data between I/O ports and the EAX register (32-bit I/O), the AX register (16-bit I/O), or the AL (8-bit 1/O) register. The address of the I/O port can be given with an immediate value or a value in the DX register. The string I/O instructions INS (input string from 1/O port) and OUTS (output string to I/O port) move data between an I/O port and a memory location. The address of the I/O port being accessed is given in the DX register; the source or destination memory address is given in the DS:ESI or ES:EDI register, respectively. When used with one of the repeat prefixes (such as REP), the INS and OUTS instructions perform string (or block) input or output operations. The repeat prefix REP modifies the INS and OUTS instructions to transfer blocks of data between an I/O port and memory. Here, the ESI or EDI register is incremented or decremented (according to the setting of the DF flag’ in the EFLAGS register) after each byte, word, or doubleword is transferred between the selected I/O port and memory. See the individual references for the IN, INS, OUT, and OUTS in Chapter 3. 7.9.6 Protected-Mode 1/0 When the processor is running in protected mode, the following protection mechanisms regulate access to 1/O ports. = When accessing 1/O ports through the I/O address space, two protection devices control access: ~ The 1/O privilege level (IOPL) field in the EFLAGS register - The I/O permission bit map of a task state segment (ISS) = When accessing memory-mapped 1/O ports, the normal segmentation and paging protection also affect access to I/O ports 7.9.7 Ordering 1/0 When controlling I/O devices it is often important that memory and I/O operations be carried out in precisely the order programmed. For example, a program may write a command to an I/O port, then read the status of the I/O device from another 1/O port. It is important that the status returned be the status of the device after it receives the command, not before. When using memory-mapped I/O, caution should be taken to avoid situations in which the programmed order is not preserved by the processor. To optimize performance, the processor allows cacheable memory reads to be reordered ahead of buffered writes in most situations. foprocessors and Microcontrollers _7-22 Interrupts, Exceptions and VO Internally, processor reads (cache hits) can be reordered around buffered writes. When using memory-mapped I/O, therefore, is possible that an I/O read might be performed before the memory write of a previous instruction. Thus, it is recommended that we should prevent caching of the address space mapped for I/O operations. Refer section 79. for more details. Review Questions 1. Compare Polling and Interrupt method. 2. Give one line explanation for the following terms with respect to Pentium a) Faults, b) Traps, c) Aborts .. Describe interrupts and exceptions in Real Mode. }. Explain protected mode interrupt processing What is difference between trop gate and interrupt gate ? . Discuss the advantages and disadvantages of task gate used in for interrupt processing over trap gate and interrupt gate. Describe interrupts and exeptions in Virtual 8086 Mode. 3. Write a short note on I/O handling in Pentium. Qo00

You might also like