CSE_ESD_LN_UG24

1
INSTITUTE OF AERONAUTICAL ENGINEERING

(AUTONOMOUS )
Dundigal, Hyderabad - 500 043
L ECTURE N OTES :
EMBEDDED SYSTEMS(AECC40)
D RAFTED BY :
M R . B. BRAHMAIAH ( IARE 10950)
Assistant Professor
D EPARTMENT O F COMPUTER SCIENCE AND ENGINEERING

I NSTITUTE OF A ERONAUTICAL E NGINEERING
July 22, 2023
Contents
Contents 1
List of Figures 5
1 EMBEDDED COMPUTING 1
1.1 Definition of Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The components of embedded system hardware: . . . . . . . . . . . . . 2
1.2 Embedded Systems Vs. General Computing Systems: . . . . . . . . . . . . . . . 4
1.3 History of Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Classification of Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Classification Based on Generation . . . . . . . . . . . . . . . . . . . . 7
1.4.1.1 First Generation: . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1.2 Second Generation: . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1.3 Third Generation: . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1.4 Fourth Generation: . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1.5 What Next? . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Classification Based on Complexity and Performance . . . . . . . . . . 9
1.4.2.1 Small-Scale Embedded Systems . . . . . . . . . . . . . . . . 9
1.4.2.2 Medium-Scale Embedded Systems . . . . . . . . . . . . . . . 9
1.4.2.3 Large-Scale Embedded Systems/Complex Systems . . . . . . 9
1.4.3 Classification Based on deterministic behavior . . . . . . . . . . . . . . 10
1.4.4 Classification Based on triggering . . . . . . . . . . . . . . . . . . . . . 10
1.5 Complex Systems and Microprocessors . . . . . . . . . . . . . . . . . . . . . . 10
1.5.1 Digital camera: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Major Application Areas of Embedded Systems . . . . . . . . . . . . . . . . . . 13
1.7 The Embedded System Design Process . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 Formalisms For System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.1 Structural Description: . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8.2 Behavioral Description: . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9 Design Example: Model Train Controller . . . . . . . . . . . . . . . . . . . . . 31
2 INTRODUCTION TO EMBEDDED C AND APPLICATIONS 44

2.1 C LOOPING STRUCTURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.1.1 LOOPS WITH A FIXED NUMBER OF ITERATIONS . . . . . . . . . 44
2.1.2 LOOPS USING A VARIABLE NUMBER OF ITERATIONS . . . . . . 46
2.1.3 LOOP UNROLLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1
Contents 2
2.2 REGISTER ALLOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.3 FUNCTION CALLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.1 POINTER ALIASING: . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 UNALIGNED DATA AND ENDIANNESS: . . . . . . . . . . . . . . . 52
2.3.3 INLINE FUNCTIONS AND INLINE ASSEMBLY: . . . . . . . . . . . 53
2.4 EMBEDDED SYSTEMS PROGRAMMING IN C: . . . . . . . . . . . . . . . . 53
2.4.1 ’C’ v/s. ’Embedded C’: . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.2 Compiler vs. Cross-Compiler: . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.3 Using ‘C’ in ‘Embedded C’: . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4.4 Arithmetic and Relational Operations: . . . . . . . . . . . . . . . . . . . 57
2.4.5 Looping Instructions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.6 Arrays and Pointers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4.7 Pointers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5 BINDING AND RUNNING EMBEDDED C PROGRAM IN KEIL IDE: . . . . 60
2.5.1 BASIC TECHNIQUES FOR READING FROM PORT PINS: . . . . . . 66
2.6 SWITCH BOUNCE: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.7 APPLICATIONS: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.7.1 LED INTERFACING WITH 8051 TO A SINGLE PIN: . . . . . . . . . 70
2.7.2 LED’s interfacing with Port, P1 of 8051: . . . . . . . . . . . . . . . . . 71
2.7.3 4X4 MATRIX KEYPAD INTERFACING WITH 8051 MICROCONTROLLER: 73
2.7.4 Interfacing keypad with 8051 microcontroller (P89V51RD2): . . . . . . 73
2.7.5 7 SEGMENT DISPLAY INTERFACING WITH 8051 MICROCONTROLLER: 79
2.7.6 LCD DISPLAY INTERFACING WITH 8051 MICROCONTROLLER: . 82
2.8 ADC (ADC0808) INTERFACING WITH 8051 MICROCONTROLLER: . . . . 86
2.9 DAC INTERFACING WITH 8051 MICROCONTROLLER: . . . . . . . . . . . 90
2.9.1 Digital-to-analog (DAC) converter . . . . . . . . . . . . . . . . . . . . . 90
2.9.2 MC1408 DAC (or DAC0808) . . . . . . . . . . . . . . . . . . . . . . . 91
2.9.3 Converting lout to voltage in DAC0808 . . . . . . . . . . . . . . . . . . 92
2.9.4 Generating a sine wave: . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.10 MULTIPLE INTERRUPTS IN 8051 MICROCONTROLLER: . . . . . . . . . . 93
2.10.1 Interrupts vs. polling: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.10.2 SIX INTERRUPTS IN THE 8051 MICROCONTROLLER: . . . . . . . 94
2.10.3 Enabling and Disabling an interrupt: . . . . . . . . . . . . . . . . . . . . 95
2.11 IE (Interrupt Enable) Register: . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.12 SERIAL COMMUNICATION PROGRAMMING: . . . . . . . . . . . . . . . . 97
2.12.1 Synchronous Communication: . . . . . . . . . . . . . . . . . . . . . . . 97
2.13 Baud rate in the 8051 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.13.1 Baud rate selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3 RTOS BASED EMBEDDED SYSTEM DESIGN 101

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2 The Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.3 Types of Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4 The structure of a Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5 Memory organization of Processes: . . . . . . . . . . . . . . . . . . . . . . . . . 113
Contents 3
3.6 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.7 Multiprocessing & Multitasking . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.8 TASK COMMUNICATION: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
3.9 Pipes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.10 TASK SYNCHRONISATION: . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.11 Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4 EMBEDDED SOFTWARE DEVELOPMENT TOOLS 170

4.1 HOST AND TARGET MACHINES: . . . . . . . . . . . . . . . . . . . . . . . . 170
4.1.1 Cross Assemblers and Tool Chains: . . . . . . . . . . . . . . . . . . . . 172
4.1.2 Tool chain for building embedded software shown below: . . . . . . . . 172
4.2 LINKER/LOCATORS FOR EMBEDDED SOFTWARE: . . . . . . . . . . . . . 173
4.2.1 Linking Process shown below: . . . . . . . . . . . . . . . . . . . . . . . 173
4.2.2 Native Tool Chain: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.3 Output File Formats: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3.1 Intel Hex file format: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3.2 Motorola S-Record format . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.3.3 Loading program components properly: . . . . . . . . . . . . . . . . . . 176
4.4 Locator Maps: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.4.1 Executing out of RAM: . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.5 ROM Emulators: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.6 In circuit emulators: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.6.1 Flash: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.6.2 Monitors: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.7 DEBUGGING TECHNIQUES . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.7.1 Testing on host machine : . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.8 Using laboratory Tools: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
4.8.1 Volt meters: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
4.8.2 Ohm Meters: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.8.3 Oscilloscopes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.8.4 Logic Analyzers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
4.8.5 Logical analyzers in Timing Mode: . . . . . . . . . . . . . . . . . . . . 193
4.9 In circuit emulators: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
4.9.1 Software only Monitors: . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.9.2 Disadvantages of Monitors: . . . . . . . . . . . . . . . . . . . . . . . . 197
5 INTRODUCTION TO ADVANCED PROCESSORS 199

5.1 INTRODUCTION TO ADVANCED ARCHITECTURES: . . . . . . . . . . . . 199
5.1.1 Von Neumann architecture computer: . . . . . . . . . . . . . . . . . . . 200
5.1.2 Harvard architecture: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.2 ARM(Advanced RISC Machine) Processor: . . . . . . . . . . . . . . . . . . . . 202
5.2.1 Memory Organization in ARM Processor: . . . . . . . . . . . . . . . . . 203
5.2.2 Data Operations in ARM: . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.3 Types of Instructions supported by ARM Processor: . . . . . . . . . . . . . . . . 205
5.4 ARM Register indirect addressing: . . . . . . . . . . . . . . . . . . . . . . . . . 207
Contents 4
5.5 SHARC Processor: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

5.6 SHARC PROCESSOR PROGRAMMING MODEL: . . . . . . . . . . . . . . . 211
5.6.1 Status registers: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
5.6.2 Multifunction computations or instruction level parallel processing: . . . 214
5.7 Pipelining in SHARC processor: . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.7.1 Addressing modes provided by DAG in SHARC Processor: . . . . . . . 215
5.7.2 BASIC addressing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.7.3 ARM vs. SHARC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.7.4 The SHARC ALU operations: . . . . . . . . . . . . . . . . . . . . . . . 218
5.8 I2 C : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.8.1 Bus transaction or transmission process: . . . . . . . . . . . . . . . . . . 222
5.8.2 Transmitting byte in I2 C Bus (Timing Diagram): . . . . . . . . . . . . . 223
5.8.3 I2 C interface on a microcontroller: . . . . . . . . . . . . . . . . . . . . . 223
5.9 Controlled Area Network: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.9.1 CAN DATA FRAME: . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.9.2 Architecture of CAN controller: . . . . . . . . . . . . . . . . . . . . . . 225
5.10 INTERNET ENABLED SYSTEMS: . . . . . . . . . . . . . . . . . . . . . . . . 226
5.10.1 IP Protocol: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.10.2 IP Packet Format: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
List of Figures
1.1 General block diagram of a digital camera . . . . . . . . . . . . . . . . . . . . . 12

1.2 Major levels of abstraction in the design process . . . . . . . . . . . . . . . . . . 15
1.3 GPS moving map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Requirements form for GPS moving map system . . . . . . . . . . . . . . . . . 19
1.5 Block diagram for the moving map . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Hardware of moving map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7 Hardware of moving map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 An object in UML notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.9 A class in UML notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 Derived classes as a form of generalization in UML . . . . . . . . . . . . . . . . 25
1.11 Multiple inheritance in UML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.12 Links and association. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.13 A state and transition in UML. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.14 Signal, call, time-out events in UML. . . . . . . . . . . . . . . . . . . . . . . . . 28
1.15 A state machine specification in UML. . . . . . . . . . . . . . . . . . . . . . . . 29
1.16 A sequence diagram in UML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.17 A model train control system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.18 A model train control system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.19 A UML sequence diagram for a typical sequence of train control commands. . . 34
1.20 Class diagram for the train controller messages. . . . . . . . . . . . . . . . . . . 34
1.21 UML collaboration diagram for the major subsystems of the train controller system. 35
1.22 A UML class diagram for the train controller showing the composition of the sub-
systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.23 Classes describing analog physical objects in the train control systems. . . . . . . 37
1.24 Controlling motor speed by pulse width modulation. . . . . . . . . . . . . . . . 38
1.25 Class diagram for the Panel and Motor Interface. . . . . . . . . . . . . . . . . . 38
1.26 Class diagram for the transmitter and Receiver. . . . . . . . . . . . . . . . . . . 39
1.27 Class diagram for the Formatter class. . . . . . . . . . . . . . . . . . . . . . . . 39
1.28 Sequences diagram for transmitting a control input. . . . . . . . . . . . . . . . . 40
1.29 Sequences diagram for transmitting a control input. . . . . . . . . . . . . . . . . 41
1.30 State diagram for the panel active behaviour. . . . . . . . . . . . . . . . . . . . . 41
1.31 Class diagram for the controller class. . . . . . . . . . . . . . . . . . . . . . . . 42
1.32 State diagram for the controller operate behaviour. . . . . . . . . . . . . . . . . . 42
1.33 Sequence diagram for a set-speed command received by the train. . . . . . . . . 43
2.1 C compiler register usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5
List of Figures 6
2.2 ATPCS Argument passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.3 Compiler vs. Cross-Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4 C Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.5 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6 Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.7 Logical Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.8 Looping Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.9 char arr [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.10 Create new uVision project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.11 Select your device model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.12 Message window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.13 create new file and save it with .C . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.14 Project window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.15 Right click on target and click on options for target . . . . . . . . . . . . . . . . 64
2.16 create Hex file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.17 Add files to source group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.18 Build target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.19 Build target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.20 pin diagram of 8051 microcontroller . . . . . . . . . . . . . . . . . . . . . . . . 67
2.21 Reading and writing bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.22 SWITCH BOUNCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.23 LED INTERFACING WITH 8051 TO A SINGLE PIN . . . . . . . . . . . . . . 70
2.24 LED’s interfacing with Port, P1 of 8051 . . . . . . . . . . . . . . . . . . . . . . 72
2.25 4X4 MATRIX KEYPAD INTERFACING WITH 8051 MICROCONTROLLER . 73
2.26 Interfacing keypad with 8051 microcontroller (P89V51RD2) . . . . . . . . . . . 74
2.27 FINDING COLUMN NUMBER . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.28 FINDING ROW NUMBER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.29 7 SEGMENT DISPLAY INTERFACING WITH 8051 MICROCONTROLLER . 79
2.30 7 SEGMENT DISPLAY INTERFACING DISCRIPTION . . . . . . . . . . . . . 80
2.31 LCD Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.32 Pin Desciption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.33 Pin Desciption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.34 LCD DISPLAY INTERFACING WITH 8051 MICROCONTROLLER . . . . . . 84
2.35 ADC (ADC0808) INTERFACING WITH 8051 MICROCONTROLLER . . . . . 87
2.36 Clock input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.37 INTERFACING WITH 8051 MICROCONTROLLER . . . . . . . . . . . . . . 88
2.38 8051 Connection to DAC808 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.39 Generating a sine wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.40 Angle vs. Voltage Magnitude for Sine Wave . . . . . . . . . . . . . . . . . . . . 94
2.41 Interrupt Vector Table for the 8051 . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.42 Interrupt Enable) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.43 Interrupt Enable) Register table . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.44 Calculation of baud rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.45 Timer 1 THI register values for different baud rates . . . . . . . . . . . . . . . . 99
2.46 Comparison of Baud rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
List of Figures 7
3.1 Architecture of Operating System . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.2 Monolithic Kernel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3 Structure of a Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4 Memory organization of a Process . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5 Process states and State transition . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.6 Memory organization of process and its associated Threads . . . . . . . . . . . 116
3.7 Multitasking Context Switching . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.8 Non-preemptive scheduling – First Come First Served (FCFS)/First In First Out
(FIFO) Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.9 Round Robin Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.10 Concept of shared memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.11 unidirectional pipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.12 message queue based indirect messaging for IPC . . . . . . . . . . . . . . . . . 142
3.13 mailbox based indirect messaging for IPC . . . . . . . . . . . . . . . . . . . . . 143
3.14 Concept of Remote Procedure Call (RPC) for IPC . . . . . . . . . . . . . . . . . 144
3.15 Concept of Remote Procedure Call (RPC) for IPC . . . . . . . . . . . . . . . . . 144
3.16 Race condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.17 Deadlock Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
3.18 Deadlock leading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.19 Visualization of the ‘Dining Philosophers’ Problem’ . . . . . . . . . . . . . . . 153
3.20 ’Real Problems’ in the ’Dining Philosophers ’ . . . . . . . . . . . . . . . . . . . 155
3.21 Output of win32 program illustrating producer-consumer problem . . . . . . . . 158
3.22 Priority Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3.23 Priority Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.24 Handling Priority Inversion problem with priority Ceiling . . . . . . . . . . . . . 162
3.25 Role of device driver in Embedded OS based products . . . . . . . . . . . . . . 166
4.1 Difference between host and target system . . . . . . . . . . . . . . . . . . . . . 171

4.2 The figure shows the process of building software for an embedded system. . . . 172
4.3 An illustration of linking process. . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.4 Native tool chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.5 Intel Hex file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.6 Motorola S-Record format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.7 How the tool chain uses segments . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.8 Locator places segments in memory . . . . . . . . . . . . . . . . . . . . . . . . 178
4.9 Locator MAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.10 Semantic edge view of socket . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.11 Test System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.12 Test scaffold for the barcode scanner software . . . . . . . . . . . . . . . . . . 188
4.13 Typical Oscilloscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.14 A Reasonable clock signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.15 A Questionable clock signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
4.16 A dead clock signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
4.17 A ROM chip selection signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.18 Logic analyzer timing display: Button and Alarm signal . . . . . . . . . . . . . . 194
List of Figures 8
4.19 Logic Analyzer timing Display: Data and RTS signal . . . . . . . . . . . . . . . 195
4.20 Logic analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
4.21 Typical logic analyzer state mode display . . . . . . . . . . . . . . . . . . . . . 196
4.22 software only the monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.1 Von Neumann architecture computer . . . . . . . . . . . . . . . . . . . . . . . . 200

5.2 Harvard architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.3 Differences between Von neumann and harvard architecture . . . . . . . . . . . . 201
5.4 Difference between RISC and CISC Processors . . . . . . . . . . . . . . . . . . 202
5.5 Byte organization within word . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.6 Basic ARM programming model . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.7 Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.8 MOV Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.9 Load store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
5.10 Register Indirect ddressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.11 offset addressing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.12 Condition codes in ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.13 example for flow of control programs . . . . . . . . . . . . . . . . . . . . . . . 210
5.14 Procedure call in ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.15 SHARC Programming model . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
5.16 Data Address Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.17 DAG 1 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
5.18 DAG 2 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.19 Fixed point ALU operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.20 Floating point ALU operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.21 Shifter operations in SHARC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.22 I2 C bus system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.23 Electrical Interface to I2 C bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.24 Format of I2 C adress transmission . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.25 Typical bus transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.26 State transition of I2 C bus master . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.27 Transmitting byte in I2 C Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.28 I2 C interface on a microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.29 Physical and electrical organization of CAN bus . . . . . . . . . . . . . . . . . 224
5.30 CAN DATA FRAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.31 Architecture of CAN controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.32 Protocal utilization in internet communication . . . . . . . . . . . . . . . . . . . 227
5.33 IP Packet structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.34 Internet service stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Chapter 1
EMBEDDED COMPUTING
Course Outcomes
After successful completion of this module, students should be able to:
CO 1 Summarize the concepts of Embedded Systems and formalisms for Understand
system design with examples.
1.1 Definition of Embedded System
System Definition:
A way of working, organizing or performing one or many tasks according to a fixed set of rules,
program or plan. Also an arrangement in which all units assemble and work together according to
a program or plan.
Examples of Systems:
• Time display system – A watch
• Automatic cloth washing system – A washing machine
Embedded System Definition:

“An embedded system is a system that has software embedded into computer-hardware, which
makes a system dedicated for an application (s) or specific part of an application or product or part
of a larger system.”
1
Chapter 1. EMBEDDED COMPUTING 2
(Or)
An embedded system is one that has dedicated purpose software embedded in computer hardware.
(Or)
It is a dedicated computer based system for an application(s) or product. It may be an independent
system or a part of large system. Its software usually embeds into a ROM (Read Only Memory)
or flash.”
(Or)
It is any device that includes a programmable computer but is not itself intended to be a general
purpose computer.”
In simple words, Embedded System = (Hardware + Software) dedicated for a particular task
with its own memory.
1.1.1 The components of embedded system hardware:
MICROPROCESSOR:
Microprocessor is a multipurpose, programmable device that accepts digital data as input, pro-
cesses it according to instructions stored in its memory, and provides results as output.
(or)
A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device
that reads binary instructions from a storage device called memory accepts binary data as input
and processes data according to instructions, and provides result as output.
MICROCONTROLLER:
A microcontroller (sometimes abbreviated µC, µC or MCU) is a small computer on a single inte-

grated circuit containing a processor core, memory, and programmable input/output peripherals.
Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as
a typically small amount of RAM.
or
CPUs with integrated memory or peripheral interfaces.
DIGITAL SIGNAL PROCESSOR:
Dedicated processors .A digital signal processor (DSP) is a specialized microprocessor (or a SIP
block), with its architecture optimized for the operational needs of digital signal processing.
IMAGE PROCESSOR:
An image processor, image processing engine, also called media processor, is a specialized digital
signal processor (DSP) used for image processing in digital cameras, mobile phones or other
devices.
EMBEDDED COMPUTING SYSTEM:
An embedded system is a special-purpose system in which the computer is completely encapsu-

lated by the device it controls. Unlike a general-purpose computer, such as a personal computer,
an embedded system performs pre-defined tasks, usually with very specific requirements. Since
the system is dedicated to a specific task, design engineers can optimize it, reducing the size and
cost of the product.
Some examples of embedded systems include ATMs, cell phones, printers, thermostats, calcula-
tors, and videogame consoles.
1.2 Embedded Systems Vs. General Computing Systems:
The computing revolution began with the general purpose computing requirements. Later it was
realized that the general computing requirements are not sufficient for the embedded computing
requirements. The embedded computing requirements demand ’something special’ in terms of
response to stimuli, meeting the computational deadlines, power efficiency, limited memory avail-
ability, etc. Let’s take the case of your personal computer, which may be either a desktop PC or a
laptop PC or a palmtop PC. It is built around a general purpose processor like an Intel® Centrino
or a Duo/Quad core or an AMD Turion processor and is designed to support a set of multiple
peripherals like multiple USB 2.0 ports, Wi-Fi, ethernet, video port, IEEE1394, SD/CF/MMC
external interfaces, Bluetooth, etc and with additional interfaces like a CD read/writer, on-board
Hard Disk Drive (MD), gigabytes of RAM, etc. You can load any supported operating system (like
Windows® XP/Vista/7, or Red Hat Linux/Ubuntu Linux, UNIX etc) into the hard disk of your PC.
You can write or purchase a multitude of applications for your PC and can use your PC for running
a large number of applications (like printing your dear’s photo using a printer device connected to
the PC and printer software, creating a document using Micro-soft® Office Word tool, etc.) Now
let us think about the DVD player you use for playing DVD movies.
Is it possible for you to change the operating system of your DVD? Is it possible for you to write
an application and download it to your DVD player for executing? Is it possible for you to add a
printer soft-ware to your DVD player and connect a printer to your DVD player to take a printout?
Is it possible for you to change the functioning of your DVD player to a television by changing the
embedded software? The answers to all these questions are ’NO’. Can you see any general purpose
interface like Bluetooth or Wi-Fi on your DVD player? Of course ’NO’. The only ’interface you
can find out on the DVD player is the interface for connecting the DVD player with the display
screen and one for controlling the DVD player through a remote (May be an IR or any other
specific wireless interface). Indeed your DVD player is an embedded system designed specifically
for decoding digital video and generating a video signal as output to your TV or any other display
screen which supports the display inter-face supported by the DVD Player. Let us summarize our
findings from the comparison of embedded system and general purpose computing system with
the help of a table:
General Purpose Computing System Embedded Systems

A system which is a combination of generic A system which is a combination of special
hardware and General Purpose Operating purpose hardware and embedded OS for exe-
System for executing a variety of applications cuting a specific set of applications
Contain a General Purpose Operating System May or may not contain an operating system
(GPOS) for functioning
Applications are alterable (programmable) by The firmware of the embedded system is pre-
user (It is possible for the end user to re-install programmed and it is non-alterable by end-
the Operating System, and add or remove user user
applications)
Performance is the key deciding factor on the Application specific requirements (like per-
selection of the system. Always ”Faster is formance, power requirements, memory us-
Better” age etc) are the key deciding factors
Less/not at all tailored towards reduced oper- Highly tailored to take advantage of the power
ating power requirements options for different saving modes supported by hardware and Op-
levels of power management. erating System
Response requirements are not time critical For certain category of embedded systems
like mission critical systems, the response
time requirement is highly critical
Need not be deterministic in execution behav- Execution behavior is deterministic for cer-
ior tain type of embedded systems like ”Hard
Real Time” systems
1.3 History of Embedded Systems
Embedded systems were in existence even before the IT revolution. In the olden days embedded
systems were built around the old vacuum tube and transistor technologies and the embedded al-
gorithm was developed in low level languages. Advances in semiconductor and nano-technology
and IT revolution gave way to the development of miniature embedded systems. The first rec-
ognized modem embedded system is the Apollo Guidance Computer (AGC) developed by the
MIT Instrumentation Laboratory for the lunar expedition. They ran the inertial guidance systems
of both the Command Module (CM) and . the Lunar Excursion Module (LEM). The Command
Module was designed to encircle the moon while the Lunar Module and its crew were designed
to go down to the moon surface and land there safely. The Lunar Module featured in total 18
engines. There were 16 reaction control thrusters, a descent engine and an ascent engine. The
descent engine was ’designed to’ provide thrust to the lunar module out of the lunar orbit and
land it safely on the moon. MIT’s original design was based on 4K words of fixed memory (Read
Only Memory) and 256 words of erasable memory (Random Access Memory). By June 1963, the
figures reached 10K of fixed and 1K of erasable memory. The final configuration was 36K words
of fixed memory and 2K words of erasable memory. The clock frequency of the first microchip
proto model used in AGC was 1.024 MHz and it was derived from a 2.048 MHz crystal clock. The
computing unit of AGC consisted of approximately 11 instructions and 16 bit word logic. Around
5000 ICs (3-input NOR gates, RTL logic) supplied by Fairchild Semiconductor were used in this
design. The user interface unit of AGC is known as DSKY (display/keyboard). DSKY looked like
a calculator type keypad with an array of numerals. It was used for inputting the commands to the
module numerically.
The first mass-produced embedded system was the guidance computer for the Minuteman-I mis-
sile in 1961. It was the Autonetics D-17 guidance computer, built using discrete transistor logic
and a hard-disk for main memory. The first integrated circuit was produced in September 1958 but
computers using them didn’t begin to appear until 1963. Some of their early uses were in embed-
ded systems, notably used by NASA for the Apollo Guidance Computer and by the US military in
the Minuteman-II intercontinental ballistic missile.
1.4 Classification of Embedded Systems
It is possible to have a multitude of classifications for embedded systems, based on different crite-
ria. Some of the criteria used in the classification of embedded systems are:
1. Based on generation.
2. Complexity and performance requirements.
3. Based on deterministic behaviour.
4. Based on triggering.
The classification based on deterministic system behaviour is applicable for ’Real Time’ systems.
The application/task execution behaviour for an embedded system can be either deterministic or
non-deterministic. Based on the execution behaviour, Real Time embedded systems are classified
into Hard and Soft. We will discuss about hard and soft real time systems in a later chapter.
Embedded Systems which are ’Reactive’ in nature (Like process control systems in industrial
control applications) can be classified based on the trigger. Reactive systems can be either event
triggered or time triggered.
1.4.1 Classification Based on Generation
This classification is based on the order in which the embedded processing systems evolved from
the first version to where they are today. As per this criterion, embedded systems can be classified
into:
1.4.1.1 First Generation:
The early embedded systems were built around 8bit microprocessors like 8085 and Z80, and 4bit
microcontrollers. Simple in hardware circuits with firmware developed in Assembly code. Digital
telephone keypads, stepper motor control units etc. are examples of this.
1.4.1.2 Second Generation:
These are embedded systems built around 16bit microprocessors and 8 or 16 bit microcontrollers,
following the first generation embedded systems. The instruction set for the second generation
processors/controllers were much more complex and powerful than the first generation proces-
sors/controllers. Some of the second generation embedded systems contained embedded operat-
ing systems for their operation. Data Acquisition Systems, SCADA systems, etc. are examples of
second generation embedded systems.
1.4.1.3 Third Generation:
With advances in processor technology, embedded system developers started making use of power-
ful 32bit processors and 16bit microcontrollers for their design. A new concept of application and
domain specific processors/controllers like Digital Signal Processors (DSP) and Application Spe-
cific Integrated Circuits (ASICs) came into the picture. The instruction set of processors became
more complex and powerful and the concept of instruction pipelining also evolved. The proces-
sor market was flooded with different types of processors from different vendors. Processors like
Intel Pentium, Motorola 68K, etc. gained attention in high performance embedded requirements.
Dedicated embedded real time and general purpose operating systems entered into the embedded
market. Embedded systems spread its ground to areas like robotics, media, industrial process
control, networking, etc.
1.4.1.4 Fourth Generation:
The advent of System on Chips (SoC), reconfigurable processors and multicore processors are
bringing high performance, tight integration and miniaturisation into the embedded device market.
The SoC technique implements a total system on a chip by integrating different functionalities with
a processor core on an integrated circuit. We will discuss about SoCs in a later chap-ter. The fourth
generation embedded systems are making use of high performance real time embedded operating
systems for their functioning. Smart phone devices, mobile internet devices (MIDs), etc. are
examples of fourth generation embedded systems.
1.4.1.5 What Next?
The processor and embedded market is highly dynamic and demanding. So ‘what will be the next
smart move in the next embedded generation?’ Let’s wait and see.
1.4.2 Classification Based on Complexity and Performance
This classification is based on the complexity and system performance requirements. According
to this classification, embedded systems can be grouped into:
1.4.2.1 Small-Scale Embedded Systems
Embedded systems which are simple in application needs and where the performance requirements
are not time critical fall under this category. An electronic toy is a typical example of a small-scale
embedded system. Small-scale embedded systems are usually built around low performance and
low cost 8’ dr 16 bit microprocessors/microcontrollers. A small-scale embedded system may or
may not contain an operating system for its functioning.
1.4.2.2 Medium-Scale Embedded Systems
Embedded systems which are slightly complex in hardware and firmware (software) requirements
fall under this category. Medium-scale embidded systems are usually built around medium per-
formance, low cost 16 or 32 bit microprocessors/microcon-. trollers or digital signal processors.
They usually contain an embedded operating system (either general purpose or real time operating
system) for functioning.
1.4.2.3 Large-Scale Embedded Systems/Complex Systems
Embedded systems which involve highly complex hardware and firmware requirements fall under
this category. They are em-ployed in mission critical applications demanding high performance.
Such systems are commonly built around high performance 32 or 64 bit RISC processors/con-
trollers or Reconfigurable System on Chip (RSoC) or multi-core processors and programmable
logic devices. They may contain multiple proces-sors/controllers and co-units/hardware acceler-
ators for offloading the processing requirements from the main processor of the system. Decod-
ing/encoding of media, cryptographic function implementation, etc. are examples for processing
requirements which can be implemented using a co-processor/hard-ware accelerator. Complex
embedded systems usually contain a high performance Real Time Operating System (RTOS) for
task scheduling, prioritization and management:
1.4.3 Classification Based on deterministic behavior
This classification is applicable for “Real Time” systems. The task execution behavior for an
embedded system may be deterministic or non-deterministic. Based on execution behavior Real
Time embedded systems are divided into two types
• Hard Real Time embedded systems: Missing a program/task execution time deadline can have
catastrophic consequences (financial, human loss of life, etc.)
• Soft Real Time embedded systems: Missing a deadline may not be critical and can be tolerated
to a certain degree
1.4.4 Classification Based on triggering
Embedded systems which are “Reactive” in nature can be based on triggering. Reactive systems
can be:
• Event triggered: Activities within the system (e.g., task run-times) are dynamic and depend
upon occurrence of different events .
• Time triggered: Activities within the system follow a statically computed schedule (i.e., they
are allocated time slots during which they can take place) and thus by nature are predictable.
1.5 Complex Systems and Microprocessors
Embedded computer system is a physical system that employs computer control for a specific
purpose rather than for general purpose computation. (OR) Embedded system is a special-purpose
computer system designed to perform one or few dedicated functions.
Since, the embedded system is dedicated to perform specific tasks, design engineers can optimize
it, reducing the size and cost of the product or increasing the reliability and performance.
Application areas:
1. Automotive electronics
2. Aircraft electronics
3. Trains
4. Telecommunication
5. Medical systems
6. Military applications
7. Authentication
8. Consumer electronics
9. Fabrication equipment
10. Robotics
Examples of Embedded Systems:
Embedded systems have very diversified applications. A few examples of embedded system ap-
plications are as follows
1.5.1 Digital camera:
A digital camera is a popular consumer electronic device that can capture images or take pictures
and store them in a digital format. A digital camera is the best example of Embedded system being
very widely used all over the world. It includes a powerful processor capable of handling complex
signal processing operations because image capturing, storing and displaying is involved.
Fig. 1.1 shows a general block diagram of a digital camera.

F IGURE 1.1: General block diagram of a digital camera
1. In a digital camera, an array of optical sensor is used to capture image. These sensors are the
photodiodes which convert light intensity into electrical voltage. Charge-Coupled Devices
(CCD) is a special type of sensor which can give high quality images.
2. An image is made up of picture elements (pixels) of different luminance and chrominance.

The number of pixels in a picture determines the quality of the picture, which is called
resolution of the image. More the number of pixels, higher is the resolution and better is the
quality. Each sensing element generates a charge that corresponds to one pixel. Since, the
charge is analog quantity; it is converted into digital form using ADC.
3. After ADC, a digital representation of an image is obtained. The colour and intensity of
each pixel is represented by number of bits. The digitized image is now considered as a
two dimensional matrix having p x q pixels and each pixel is given a equivalent decimal or
hexadecimal number.
4. The important functional block of a digital camera is the system controller. System con-
troller consists of a processor, memory and different interface circuitry to connect to other
parts of the system. Processor processes the digitized image data to generate images rep-
resented in standard formats suitable for use in computers, printers and display devices.
Processing tasks performed on an image are: brightness modification, contrast stretching,
smoothing, sharpening etc. For compressed images, the format is called JPEG (joint Pho-
tographic Experts Group) while for uncompressed images, the format used in BMP (Bit
Mapped) or TIFF (Tagged Image File Format).
5. The processed images are stored in Image storage device. Flash memory cards, floppy
disks or miniature hard drives can be used for this purpose. The number of images that
can be taken and saved depends on the capacity of image storage device. It also depends
on the resolution of the image. Higher resolution image takes more memory space because
the number of pixels is more. An uncompressed image takes more memory space than a
compressed image.
6. The image is displayed on a LCD screen in the camera. The LCD display normally con-
sumes more power than the processor.
7. A standard computer interface provides a mechanism for transferring images to a computer

or a printer. This interface may be serial or parallel interface or a connector for a standard
bus such as PCI or USB. If flash cards are used, images can be transferred by physically
removing the card and inserting into the slot of the computer.
8. There are some other electromechanical parts such as switches to be operated by the user,
motor to rotate the camera for focussing purposes etc. The system controller must be capable
generating signals to co-ordinate and control all such activities.
1.6 Major Application Areas of Embedded Systems
We can find applications of embedded systems in following sectors:
Daily Life Electronic appliances (Lift, Microwave Oven, Refrigerator, Washing Machine).
Health Care (X-ray, ECG, Cardiograph, diseases diagnosis devices etc).
Education (Laptop or desktop, projector, printer, calculator, lab equipments etc).
Communication (Mobile phone, satellite, Modem, Network Hub, Router, Telephone, Fax).
Security System (CC Camera, X ray Scanner, RFID System, Password protected door, Face de-
tection).
Entertainment (Television etc).
Banking System (ATM etc).
Automation.
Navigation.
Consumer Electronics: Camcorders, Cameras.
Household appliances: Washing machine, Refrigerator.
Automotive industry: Anti-lock breaking system(ABS), engine control.
Home automation & security systems: Air conditioners, sprinklers, fire alarms.
Telecom: Cellular phones, telephone switches.
Computer peripherals: Printers, scanners.
Computer networking systems: Network routers and switches.
Healthcare: EEG, ECG machines.
Banking & Retail: Automatic teller machines, point of sales.
Card Readers: Barcode, smart card readers.
1.7 The Embedded System Design Process
The embedded system design process aimed at two objectives.

o First, it will give us an introduction to the various steps in embedded system design.
o Second, it will allow us to consider the design methodology itself.
A design methodology is important for three reasons.
o First, to ensure that we have done everything we need.
o Second, it allows us to develop computer-aided design tools.
o Third, it makes members of a design team to communicate easily Designing can be done in two
ways. They are
1. Top down
2. Bottom up
Figure 1.1 summarizes the major steps in the embedded system design process. In this top–down
view, we start from the system requirements. In bottom up approach we start with components.
Specification, we create a more detailed description of what we want. But the specification states
only how the system behaves, not how it is built. The details of the system’s internals begin to take
shape when we develop the architecture, which gives the system structure in terms of large compo-
nents. Once we know the components we need, we can design those components, including both
software modules and any specialized hardware we need. Based on those components, we can
finally build a complete system. In this section we will consider design from the top–down—we
will begin with the most abstract description of the system.
The alternative is a bottom–up view in which we start with components to build a system. Bot-
tom–up design steps are shown in the figure as dashed-line arrows. We need bottom–up design
because we do not have perfect insight into how later stages of the design process will turn out.
F IGURE 1.2: Major levels of abstraction in the design process
We need to consider the major goals of the design:

• Manufacturing cost;
• Performance (both overall speed and deadlines); and
• Power consumption.
We must also consider the tasks we need to perform at every step in the design process. At each
step in the design, we add detail:
• We must analyze the design at each step to determine how we can meet the specifications.
• We must then refine the design to add detail.
• And we must verify the design to ensure that it still meets all system goals, such as cost, speed,
and so on.
1. Requirements: Clearly, before we design a system, we must know what we are designing. The
initial stages of the design process capture this information for use in creating the architecture and
components. We generally proceed in two phases:
1. First, we gather an informal description from the customers known as requirements;
2. Second we refine the requirements into a specification that contains enough information to begin
designing the system architecture.
Separating out requirements analysis and specification is often necessary because of the large gap
between what the customers can describe about the system they want and what the architects need
to design the system. Requirements may be functional or non functional.
Typical Non functional requirements include:
• Performance: The speed of the system is often a major consideration both for the usability of
the system and for its ultimate cost. As we have noted, performance may be a combination of soft
performance metrics such as approximate time to perform a user-level function and hard deadlines
by which a particular operation must be completed.
•Cost: The target cost or purchase price for the system is almost always a consideration. Cost
typically has two major components:
• Manufacturing cost includes the cost of components and assembly.
• Nonrecurring engineering (NRE) costs include the personnel and other costs of designing
the system.
• Physical size and weight: The physical aspects of the final system can vary greatly depending
upon the application. An industrial control system for an assembly line may be designed to fit into
a standard-size rack with no strict limitations on weight. A handheld device typically has tight
requirements on both size and weight that can ripple through the entire system design.
• Power consumption: Power, of course, is important in battery-powered systems and is often
important in other applications as well. Power can be specified in the requirements stage in terms
of battery life.
Validating a set of requirements is ultimately a psychological task since it requires understanding
both what people want and how they communicate those needs. One good way to refine at least
the user interface portion of a system’s requirements is to build a mock-up. The mock-up may use
scanned data to simulate functionality in a restricted demonstration, and it may be executed on a
PC or a workstation.
Requirements analysis for big systems can be complex and time consuming. However, capturing
a relatively small amount of information in a clear, simple format is a good start towards under-
standing system requirements. As part of system design to analyze requirements, we will use a
simple requirements methodology. Figure 1.2 shows a sample requirements form that can be filled
out at the start of the project.
Let’s consider the entries in the form:
TABLE 1.2: Sample requirements form
Name:
Purpose:
Inputs:
Outputs:
Functions:
Performance:
Manufacturing Cost:
Power:
Physical size and weight:
Name: This is simple but helpful. Giving a name to the project should tell the purpose of the
machine.
Purpose: This should be a brief one- or two-line description of what the system is supposed to
do. If you can’t describe the essence of your system in one or two lines, chances are that you don’t
understand it well enough.
Inputs and outputs: These two entries are more complex than they seem. The inputs and outputs
to the system encompass a wealth of detail:
— Types of data: Analog electronic signals? Digital data? Mechanical inputs? — Data character-
istics: Periodically arriving data, such as digital audio samples? How many bits per data element?
— Types of I/O devices: Buttons? Analog/digital converters? Video displays? Functions: This
is a more detailed description of what the system does. A good way to approach this is to work
from the inputs to the outputs: When the system receives an input, what does it do? How do user
interface inputs affect these functions? How do different functions interact?
Performance: Many embedded computing systems spend at least some time to control physical
devices or processing data coming from the physical world. In most of these cases, the computa-
tions must be performed within a certain time.
Manufacturing cost: This includes primarily the cost of the hardware components. Even if you
don’t know exactly how much you can afford to spend on system components, you should have
some idea of the eventual cost range. Cost has a substantial influence on architecture.
Power: Similarly, you may have only a rough idea of how much power the system can consume,
but a little information can go a long way. Typically, the most important decision is whether the
machine will be battery powered or plugged into the wall. Battery-powered machines must be
much more careful about how they spend energy.
Physical size and weight: You should give some indication of the physical size of the system
that helps to take architectural decisions. After writing the requirements, you should check them
for internal consistency. To practice the capture of system requirements, Example 1.1 creates the
requirements for a GPS moving map system.
Example 1.1
Requirements analysis of a GPS moving map
The moving map is a handheld device that displays for the user a map of the terrain around the
user’s current position; the map display changes as the user and the map device change position.
The moving map obtains its position from the GPS, a satellite-based navigation system. The mov-
ing map display might look something like the following figure.
F IGURE 1.3: GPS moving map

What requirements might we have for our GPS moving map? Here is an initial list:
Functionality: This system is designed for highway driving and similar uses. The system should
show major roads and other landmarks available in standard topographic databases.
User interface: The screen should have at least 400 600 pixel resolution. The device should be
controlled by no more than three buttons. A menu system should pop up on the screen when
buttons are pressed to allow the user to make selections to control the system.
Performance: The map should scroll smoothly. Upon power-up, a display should take no more
than one second to appear, and the system should be able to verify its position and display the
current map within 15 sec.
Cost: The selling cost of the unit should be no more than $100.
Physical size and weight: The device should fit comfortably in the palm of the hand.
Power consumption: The device should run for at least eight hours on four batteries.
Requirements form for GPS moving map system:
F IGURE 1.4: Requirements form for GPS moving map system
2. Specification: The specification is more precise—it serves as the contract between the cus-
tomer and the architects.
The specification must be carefully written so that it accurately reflects the customer’s require-
ments and that can be clearly followed during design. An unclear specification leads different
types of problems.
If the behaviour of some feature in a particular situation is unclear from the specification, the de-
signer may implement the wrong functionality.
If global characteristics of the specification are wrong or incomplete, the overall system architec-
ture derived from the specification may be inadequate to meet the needs of implementation.
A specification of the GPS system would include several components:
• Data received from the GPS satellite constellation.
• Map data
• User interface.
• Operations that must be performed to satisfy customer requests.
• Background actions required to keep the system running, such as operating the GPS receiver.
3. Architecture Design:
The architecture is a plan for the overall structure of the system that will be used later to design
the components that make up the architecture. Tounderstand what an architectural description is,
let’s look at sample architecture for the moving map of Example 1.1.
Figure 1.3 shows a sample system architecture in the form of a block diagram that shows major
operations and data flows among them.
F IGURE 1.5: Block diagram for the moving map
The topographic database and to render (i.e., draw) the results for the display. We have chosen
to separate those functions so that we can potentially do them in parallel— performing rendering
separately from searching the database may help us update the screen more fluidly.
For more implementation details we should refine that system block diagram into two block dia-
grams:
o Hardware block diagram (Hardware architecture)
o Software block diagram(Software architecture)
These two more refined block diagrams are shown in Figure 1.4
The hardware block diagram clearly shows that we have one central CPU surrounded by memory
and I/O devices.
We have chosen to use two memories:
o A frame buffer for the pixels to be displayed
o A separate program/data memory for general use by the CPU
The software block diagram fairly closely follows the system block diagram.
We have added a timer to control when we read the buttons on the user interface and render data
onto the screen.
F IGURE 1.6: Hardware of moving map
F IGURE 1.7: Hardware of moving map

Architectural descriptions must be designed to satisfy both functional and nonfunctional require-
ments.
Not only must all the required functions be present, but we must meet cost, speed, power and other
nonfunctional constraints.
Starting out with system architecture and refining that to hardware and software architectures is
one good way to ensure that we meet all specifications:
We can concentrate on the functional elements in the system block diagram, and then consider the
nonfunctional constraints when creating the hardware and software architectures.
4. Designing Hardware and Software Components The architectural description tells us what
components we need. In general the components will include both hardware—FPGAs, boards,
and so on—and software modules. Some of the components will be ready-made. The CPU, for
example, will be a standard component in almost all cases, as will memory chips and many other
components. In the moving map, the GPS receiver is a good example of a specialized component
that will nonetheless be a predesigned, standard component. We can also make use of standard
software modules. One good example is the topographic database.
Standard topographic databases exist, and you probably want to use standard routines to access
the database—the data in a predefined format and it is highly compressed to save storage. Using
standard software for these access functions not only saves us design time.
5. System Integration:
Putting hardware and software components together will give complete working system. Bugs are
typically found during system integration, and good planning can help us to find the bugs quickly.
If we debug only a few modules at a time, we are more likely to uncover the simple bugs and able
to easily recognize them.
System integration is difficult because it usually uncovers problems. It is often hard to observe
the system in sufficient detail to determine exactly what is wrong— the debugging facilities for
embedded systems are usually much more limited than what you would find on desktop systems.
As a result, determining why things do not work correctly and how they can be fixed is a challenge
in itself.
1.8 Formalisms For System Design
We perform a number of different design tasks at different levels of abstraction: creating require-
ments and specifications, architecting the system, designing code, and designing tests. It is often
helpful to conceptualize these tasks in diagrams.
The Unified Modeling Language (UML). UML was designed to be useful at many levels of ab-
straction in the design process. UML is an object-oriented modeling language. The design in
terms of actual objects helps us to understand the natural structure of the system.
Object-oriented specification can be seen in two complementary ways: • Object-oriented spec-

ification allows a system to be described in a way that closely models real- world objects and their
interactions.
• Object-oriented specification provides a basic set of primitives that can be used to describe sys-
tems with particular attributes, irrespective of the relationships of those systems’ components to
real-world objects.
What is the relationship between an object-oriented specification and an object oriented program-
ming language?
A specification language may not be executable. But both object-oriented specification and pro-
gramming languages provide similar basic methods for structuring large systems.
1.8.1 Structural Description:
By structural description, we mean the basic components of the system. The principal component
of an object-oriented design is object. An object includes a set of attributes that define its internal
state.
When implemented in a programming language, these attributes usually become variables or con-
stants held in a data structure. In some cases, we will add the type of the attribute after the attribute
name for clarity, but we do not always have to specify a type for an attribute.
An object describing a display (such as a CRT screen) is shown in UML notation in Figure 1.5.
F IGURE 1.8: An object in UML notation
The text in the folded-corner page icon is a note; it does not correspond to an object in the system
and only serves as a comment. The attribute is, in this case, an array of pixels that holds the con-
tents of the display. The object is identified in two ways: It has a unique name, and it is a member
of a class. The name is underlined to show that this is a description of an object and not of a class.
A class is a form of type definition—all objects derived from the same class have the same char-
acteristics, although their attributes may have different values. A class defines the attributes that
an object may have. It also defines the operations that determine how the object interacts with the
rest of the world. In a programming language, the operations would become pieces of code used
to manipulate the object. The UML description of the Display class is shown in Figure 1.6.
F IGURE 1.9: A class in UML notation
The class has the name that we saw used in the d1 object since d1 is an instance of class Display.
The Display class defines the pixels attribute seen in the object; A class defines both the interface
for a particular type of object and that object’s implementation.
There are several types of relationships that can exist between objects and classes:
• Association occurs between objects that communicate with each other but have no owner-
ship relationship between them.
• Aggregation describes a complex object made of smaller objects.
• Composition is a type of aggregation in which the owner does not allow access to the com-
ponent objects.
• Generalization allows us to define one class in terms of another
Derived class: Unified Modeling Language, like most object-oriented languages, allows us to de-
fine one class in terms of another. An example is shown in Fig1.7, where we derive two particular
types of displays. The first, BW display, describes a black and- white display. This does not re-
quire us to add new attributes or operations, but we can specialize both to work on one-bit pixels.
A derived class inherits all the attributes and operations from its base class.
Here Display is the base class for the two derived classes. A derived class is defined to include
all the attributes of its base class. This relation is transitive if Display were derived from another
class, both BW display and Color map display would inherit all the attributes and operations of
Display’s base class as well.
F IGURE 1.10: Derived classes as a form of generalization in UML

Inheritance has two purposes. o It allows us to describe one class that shares some characteris-
tics with another class.
o It captures those relationships between classes and documents them.
Unified Modeling Language considers inheritance to be one form of generalization. A general-
ization relationship is shown in a UML diagram as an arrow with an open (unfilled) arrowhead.
Both BW display and Color map display are specific versions of Display, so Display generalizes
both of them.
Multiple inheritances:
In which a class is derived from more than one base class.
An example of multiple inheritances is shown in Figure 1.8; In this case, we have created a Multi-
media display class by combining the Display class with a Speaker class for sound.
The derived class inherits all the attributes and operations of both its base classes, Display and
Speaker.
F IGURE 1.11: Multiple inheritance in UML.
Link: A link describes a relationship between objects; association is to link as class is to object.
Fig1.9 shows an example of links and an association.
F IGURE 1.12: Links and association.
1.8.2 Behavioral Description:
We have to specify the behavior of the system as well as its structure. One way to specify the
behavior of an operation is a state machine.
Fig1.10 shows UML states; the transition between two states is shown by arrow. These state ma-
chines will not rely on the operation of a clock, as in hardware; rather, changes from one state to
another are triggered by the occurrence of events.
F IGURE 1.13: A state and transition in UML.
An event is some type of action. Events are divided into two categories.
They are:
External events: The event may originate outside the system, such as a user pressing a button.
Internal events: It may also originate inside, such as when one routine finishes its computation
and passes the result on to another routine.
We will concentrate on the following three types of events defined by UML, as illustrated in figure
1.11(signal and call event) and (Time out event)
o A signal is an asynchronous occurrence. It is defined in UML by an object that is labeled as a
¡¡signal¿¿. The object in the diagram serves as a declaration of the event’s existence.
Because it is an object, a signal may have parameters that are passed to the signal’s receiver.
o A call event follows the model of a procedure call in a programming language.
o A time-out event causes the machine to leave a state after a certain amount of time. The label
tm (time-value) on the edge gives the amount of time after which the transition occurs. A time-out
is generally implemented with an external timer.
F IGURE 1.14: Signal, call, time-out events in UML.

Unconditional and conditional transitions:
• The states in the state machine represent different conceptual operations.
• In some cases, we take conditional transitions out of states based on inputs or the results of
some computation done in the state.
• In other cases, we make an unconditional transition to the next state. Both the unconditional
and conditional transitions make use of the call event.
• Let’s consider a simple state machine specification to understand the semantics of UML
state machines. A state machine for an operation of the display is shown in Fig1.12. The
start and stop states are special states that help us to organize the flow of the state machine.
F IGURE 1.15: A state machine specification in UML.
Sequence diagram:
• It is sometimes useful to show the sequence of operations over time, particularly when
several objects are involved.
• In this case, we can create a sequence diagram, like the one for a mouse click scenario shown
in Fig1.13.
• A sequence diagram is somewhat similar to a hardware timing diagram, although the time
flows vertically in a sequence diagram, whereas time typically flows horizontally in a timing
diagram.
• The sequence diagram is designed to show a particular scenario or choice of events. In this
case, the sequence shows what happens when a mouse click is on the menu region.
• Processing includes three objects shown at the top of the diagram. Extending below each
object is its lifeline, a dashed line that shows how long the object is alive. In this case, all
the objects remain alive for the entire sequence, but in other cases objects may be created or
destroyed during processing.
• The boxes along the lifelines show the focus of control in the sequence, that is, when the
object is actively processing.
• In this case, the mouse object is active only long enough to create the mouse click event.
The display object remains in play longer; it in turn uses call events to invoke the menu
object twice: once to determine which menu item was selected and again to actually execute
the menu call.
• The find region ( ) call is internal to the display object, so it does not appear as an event in
the diagram.
F IGURE 1.16: A sequence diagram in UML.

1.9 Design Example: Model Train Controller
The model train controller, which is shown in the below figure.
i. The user sends messages to the train with the control box attached to the tracks.
ii. The control box may have familiar controls such as throttle, emergency stop button and so
on.
iii. Since train receives its electrical power from the track, the control box can send a signal to
the train over the track by modulating the power supply voltage.
iv. As shown Fig1.14, the control panel sends packet over the tracks to the receiver on the train.
Each packet includes an address so that the console can control several trains on the same
track. The packet also includes an error correction code (ECC) to guard against transmission
errors. This is a one-way communication system- the model train cannot send commands
back to the user.
Requirements:
Here is a basic set of requirements for the system:
F IGURE 1.17: A model train control system.
• The console shall be able to control up to eight trains on a single track.
• The speed of each train shall be controllable by a throttle to at least 63 different levels in
each direction (forward and reverse).
• There shall be an inertia control that shall allow the user to adjust the responsiveness of the
train to commanded changes in speed. Higher inertia means that the train responds more
slowly to a change in the throttle, simulating the inertia of a large train. The inertia control
will provide at least eight different levels.
• There shall be an emergency stop button.
• An error detection scheme will be used to transmit messages.
We can put the requirements into our chart format:

F IGURE 1.18: A model train control system.
Conceptual Specification of Model Train Controller:
1. Objects: Console , Train.
2. Commands: set speed, set inertia, Estop.
3. Console: panel, formatter, transmitter.
4. Train: receiver, controller, motor interface.
The conceptual specification allows us to understand the system little better. Writing of conceptual
specification will help us to write a detailed specification. Defining the messages will help us un-
derstand the functionality of the components. The set of commands that we can use to implement
the requirements placed on the system.
The system console controls the train by sending messages on to the tracks. The transmissions are
packetized: each packet includes an address and a message. A typical sequence of train control
commands is shown as a UML sequence diagram.
F IGURE 1.19: A UML sequence diagram for a typical sequence of train control commands.
• The focus of the control bars shows the both the console and receiver run continuously. The
packets can be sent at any time—there is no global clock controlling when the console sends
and the train receives, we do not have to worry about detecting collisions among the packets.
• Set- inertia message will send infrequently. Most of the message commands are speed com-
mands. When a train receives speed command, it will speed up and slow down the train
smoothly at rate determined by the set-inertia command.
• An emergency stop command may be received, which causes the train receiver to immedi-
ately shut down the train motor.
• We can model the commands in UML with two level class hierarchy as shown in the Fig1.16.
Here we have one base class command, there are three sub classes set-speed, set-inertia,
Estop, derived from base class. One for each specific type of command.
F IGURE 1.20: Class diagram for the train controller messages.

• We now need to model the train control system itself. There are clearly two major subsys-
tems: the control-box and the train board component. Each of these subsystems has its own
internal structure.
• The figure 1.17 Shows relationship between console and receiver (ignores role of track):
F IGURE 1.21: UML collaboration diagram for the major subsystems of the train controller sys-
tem.
• The console and receiver are each represented by objects: the console sends a sequence of
packets to the train receiver, as illustrated by the arrow. The notation on the arrow provides
both the type of message sent and its sequence in a flow of messages .we have numbered the
arrow’s messages as 1. . . n .
• Let’s break down the console and receiver into three major components.
• The console needs to perform three functions:
– Read state of front panel.
– Format messages.
– Transmit messages.
• The train receiver must also perform three major functions:
– receive message
– interpret message.
– control the train.
The UML class diagram is show in the below figure 1.18.
Console class roles: • Panel: Describes the console front panel, which contains analog knobs and
interface hardware to interface to the digital parts of the system.
• Formatter: It knows how to read the panel knobs and creates bit stream for message.
• Transmitter: Send the message along the track.

• Knobs* describes the actual analog knobs, buttons, and levers on the control panel.
• Sender* describes the analog electronics that send bits along the track.
Train class roles:
• Receiver: It knows how to turn the analog signal on the track into digital form.
• Controller: Interprets received commands and figures out how to control the motor.
• Motor interface: Generates the analog signals required to control the motor.
F IGURE 1.22: A UML class diagram for the train controller showing the composition of the
subsystems.
We define two classes to represent analog components:

o Detector* detects analog signals on the track and converts them into digital form.
o Pulser* turns digital commands into the analog signals required to control the motor speed.
Detailed Specification:
Conceptual specification that defines the basic classes, let’s refine it to create a more detailed spec-
ification. We won’t make a complete specification. But we will add details to the class. We can
now fill in the details of the conceptual specification. Sketching out the spec first helps us under-
stand the basic relationships in the system.
We need to define the analog components in a little more detail because there characteristics will
strongly influence the formatter and controller. Fig1.19 shows a little more detail than Fig 1.18, It
includes attributes and behavior of these classes. The panel has three knobs: train number (which
train is currently being controlled), speed (which can be positive or negative), and inertia. It also
has one button for emergency-stop.
The Sender and Detector classes are relatively simple: They simply put out and pick up a bit,
respectively.
F IGURE 1.23: Classes describing analog physical objects in the train control systems.
To understand the Pulser class, let’s consider how we actually control the train motor’s speed.
As shown in Figure 1.20, the speed of electric motors is commonly controlled using pulse-width
modulation: Power is applied in a pulse for a fraction of some fixed interval, with the fraction of
the time that power is applied determining the speed.
F IGURE 1.24: Controlling motor speed by pulse width modulation.
Figure 1.21 shows the classes for the panel and motor interfaces. These classes form the software
interfaces to their respective physical devices.
F IGURE 1.25: Class diagram for the Panel and Motor Interface.
• The Panel class defines a behavior for each of the controls on the panel;
• The new-settings behavior uses the set-knobs behavior of the Knobs* class to change the
knobs settings whenever the train number setting is changed.
• The Motor-interface defines an attribute for speed that can be set by other classes.
• The Transmitter and Receiver classes are shown in Figure 1.22.They provides the software
interface to the physical devices that send and receive bits along the track.
F IGURE 1.26: Class diagram for the transmitter and Receiver.
• The Transmitter provides a distinct behavior for each type of message that can be sent; it
internally takes care of formatting the message.
• The Receiver class provides a read-cmd behavior to read a message off the tracks.
• The Formatter class is shown in Figure 1.23. The formatter holds the current control settings
for all of the trains.
• The send-command method is a utility function that serves as the interface to the transmitter.
• The operate function performs the basic actions for the object.
• The panel-active behaviour returns true whenever the panel’s values do not correspond to
the current values.
F IGURE 1.27: Class diagram for the Formatter class.

The role of the formatter during the panel’s operation is illustrated by the sequence diagram of
Figure 1.24.
F IGURE 1.28: Sequences diagram for transmitting a control input.
• The figure shows two changes to the knob settings: first to the throttle, inertia, or emergency
stop; then to the train number.
• The panel is called periodically by the formatter to determine if any control settings have
changed. If a setting has changed for the current train, the formatter decides to send a
command, issuing a send- command behavior to cause the transmitter to send the bits.
• Because transmission is serial, it takes a noticeable amount of time for the transmitter to
finish a command; in the meantime, the formatter continues to check the panel’s control
settings.
• If the train number has changed, the formatter must cause the knob settings to be reset to the
proper values for the new train.
• The state diagram for a very simple version of the operate behavior of the Formatter class is
shown in Figure 1.25.
• This behavior watches the panel for activity: If the train number changes, it updates the
panel display; otherwise, it causes the required message to be sent.
F IGURE 1.29: Sequences diagram for transmitting a control input.
Figure 1.26 shows a state diagram for the panel-active behavior.
F IGURE 1.30: State diagram for the panel active behaviour.

The definition of the train’s Controller class is shown in Figure 1.27.

The operate behavior is called by the receiver when it gets a new command; operate looks at the
contents of the message and uses the issue-command behavior to change the speed, direction, and
inertia settings as necessary.
F IGURE 1.31: Class diagram for the controller class.
A specification for operate is shown in Figure 1.28.
F IGURE 1.32: State diagram for the controller operate behaviour.
The operation of the Controller class during the reception of a set-speed command is illustrated in
Figure 1.29.
F IGURE 1.33: Sequence diagram for a set-speed command received by the train.
Chapter 2
INTRODUCTION TO EMBEDDED C
AND APPLICATIONS
Course Outcomes
CO 2 Examine and write the Embedded Systems programming in C with Analyze
Keil Integrated Development Environment (IDE).
2.1 C LOOPING STRUCTURES
This section looks at the most efficient ways to code for and while loops on the ARM. We start
by looking at loops with a fixed number of iterations and then move on to loops with a variable
number of iterations. Finally we look at loop unrolling.
2.1.1 LOOPS WITH A FIXED NUMBER OF ITERATIONS
What is the most efficient way to write a for loop on the ARM? Let’s return to our checksum
example and look at the looping structure.
Here is the last version of the 64-llword packet checksum routine we studied in Section 5.2. This
shows how the compiler treats a loop with incrementing count i++.
44
Chapter 2. INTRODUCTION TO EMBEDDED C AND APPLICATIONS 45
int checksum v5(int *data)

{
unsigned int i;
int sum=0;
for (i=0; i¡64; i++)
{
sum += *(data++);
}
return sum;
}
This compiles to
checksum v5
MOV r2,r0 ; r2 = data
MOV r0,#0 ; sum = 0
MOV r1,#0 ;i=0
checksum v5 loop
LDR r3,[r2],#4 ; r3 = *(data++)
ADD r1,r1,#1 ; i++
CMP r1,#0x40 ; compare i, 64
ADD r0,r3,r0 ; sum += r3
BCC checksum v5 loop ; if (i¡64) goto loop
MOV pc,r14 ; return sum
It takes three instructions to implement the for loop structure:

• An ADD to increment i
• A compare to check if i is less than 64
• A conditional branch to continue the loop if i ¡ 64
This is not efficient. On the ARM, a loop should only use two instructions:
• A subtract to decrement the loop counter, which also sets the condition code flags on the result
• A conditional branch instruction
The key point is that the loop counter should count down to zero rather than counting up to some
arbitrary limit. Then the comparison with zero is free since the result is stored in the condition
flags. Since we are no longer using i as an array index, there is no problem in counting down rather
than up.
2.1.2 LOOPS USING A VARIABLE NUMBER OF ITERATIONS
Now suppose we want our checksum routine to handle packets of arbitrary size. We pass in a
variable N giving the number of words in the data packet. Using the lessons from the last section
we count down until N 0 and don’t require an extra loop counter i.
The checksum v7 example shows how the compiler handles a for loop with a variable number of
iterations N.
int checksum v7(int *data, unsigned int N)
{
int sum=0;
for (; N!=0; N–)
{
sum += *(data++);
}
return sum;
}
This compiles to
checksum v7
MOV r2,#0 ; sum = 0
CMP r1,#0 ; compare N, 0
BEQ checksum v7 end ; if (N==0) goto end
checksum v7 loop
LDR r3,[r0],#4 ; r3 = *(data++)
SUBS r1,r1,#1 ; N– and set flags
BNE checksum v7 loop ; if (N=0) goto loop
checksum v7 end
MOV r0,r2 ; r0 = sum
MOV pc,r14 ; return r0
Notice that the compiler checks that N is nonzero on entry to the function. Often this check is
unnecessary since you know that the array won’t be empty. In this case a do-while loop gives
better performance and code density than a for loop
2.1.3 LOOP UNROLLING
We saw in Section 5.3.1 that each loop iteration costs two instructions in addition to the body of
the loop: a subtract to decrement the loop count and a conditional branch.
We call these instructions the loop overhead. On ARM7 or ARM9 processors the subtract takes
one cycle and the branch three cycles, giving an overhead of four cycles per loop.
You can save some of these cycles by unrolling a loop—repeating the loop body several times,
and reducing the number of loop iterations by the same proportion. For example, let’s unroll our
packet checksum example four times.
EXAMPLE1 The following code unrolls our packet checksum loop by four times. We assume that
the number of words in the packet N is a multiple of four.
int checksum v9(int *data, unsigned int N)
{
int sum=0;
do
{
sum += *(data++);
sum += *(data++); sum += *(data++);
sum += *(data++);
N -= 4;
} while (N!=0);
return sum;
}
This compiles to
checksum v9
MOV r2,#0 ; sum = 0
checksum v9 loop LDR r3,[r0],#4 ; r3 = *(data++)
SUBS r1,r1,#4 ; N -= 4 & set flags

LDR r3,[r0],#4 ; r3 = *(data++)
LDR r3,[r0],#4 ; r3 = *(data++)
LDR r3,[r0],#4 ; r3 = *(data++)
BNE checksum v9 loop ; if (N!=0) goto loop
MOV r0,r2 ; r0 = sum
MOV pc,r14 ; return r0
We have reduced the loop overhead from 4N cycles to (4N)/4 N cycles. On the ARM7TDMI, this
accelerates the loop from 8 cycles per accumulate to 20/4 5 cycles per accumulate, nearly doubling
the speed! For the ARM9TDMI, which has a faster load instruction, the benefit is even higher.
There are two questions you need to ask when unrolling a loop:
• How many times should I unroll the loop?
• What if the number of loop iterations is not a multiple of the unroll amount? For example, what
if N is not a multiple of four in checksum v9?
To start with the first question, only unroll loops that are important for the overall performance
of the application. Otherwise unrolling will increase the code size with little performance benefit.
Unrolling may even reduce performance by evicting more important code from the cache.
Suppose the loop is important, for example, 30% of the entire application. Suppose you unroll
the loop until it is 0.5 KB in code size (128 instructions). Then the loop overhead is at most 4
cycles compared to a loop body of around 128 cycles. The loop overhead cost is 3/128, roughly
3%. Recalling that the loop is 30% of the entire application, overall the loop overhead is only 1%.
Unrolling the code further gains little extra performance, but has a significant impact on the cache
contents. It is usually not worth unrolling further when the gain is less than 1%.
For the second question, try to arrange it so that array sizes are multiples of your unroll amount. If
this isn’t possible, then you must add extra code to take care of the leftover cases.
2.2 REGISTER ALLOCATION
The compiler attempts to allocate a processor register to each local variable you use in a C func-
tion. It will try to use the same register for different local variables if the uses of the variables do
not overlap. When there are more local variables than available registers, the compiler stores the
excess variables on the processor stack. These variables are called spilled or swapped out variables
since they are written out to memory (in a similar way virtual memory is swapped out to disk).
Spilled variables are slow to access compared to variables allocated to registers. To implement a
function efficiently, you need to
•minimize the number of spilled variables
•ensure that the most important and frequently accessed variables are stored in registers
First let’s look at the number of processor registers the ARM C compilers have available for
allocating variables. Table 5.3 shows the standard register names and usage when following the
ARM-Thumb procedure call standard (ATPCS), which is used in code generated by C compilers.
Provided the compiler is not using software stack checking or a frame pointer, then the C compiler
can use registers r0 to r12 and r14 to hold variables. It must save the callee values of r4 to r11 and
r14 on the stack if using these registers.
In theory, the C compiler can assign 14 variables to registers without spillage. In practice, some
compilers use a fixed register such as r12 for intermediate scratch working and do not assign vari-
ables to this register. Also, complex expressions require intermediate working registers to evaluate.
Therefore, to ensure good assignment to registers, you should try to limit the internal loop of func-
tions to using at most 12 local variables.
If the compiler does need to swap out variables, then it chooses which variables to swap out based
on frequency of use. A variable used inside a loop counts multiple times. You can guide the com-
piler as to which variables are important by ensuring these variables are used within the innermost
loop.
The register keyword in C hints that a compiler should allocate the given variable to a register.
However, different compilers treat this keyword in different ways, and different architectures have
a different number of available registers (for example, Thumb and ARM). Therefore we recom-
mend that you avoid using register and rely on the compiler’s normal register allocation routine.
SUMMARY Efficient Register Allocation
• Try to limit the number of local variables in the internal loop of functions to 12. The compiler
F IGURE 2.1: C compiler register usage
should be able to allocate these to ARM registers.

• You can guide the compiler as to which variables are important by ensuring these variables are
used within the innermost loop.
2.3 FUNCTION CALLS
The ARM Procedure Call Standard (APCS) defines how to pass function arguments and return
values in ARM registers. The more recent ARM-Thumb Procedure Call Standard (ATPCS) covers
ARM and Thumb interworking as well. The first four integer arguments are passed in the first four
ARM registers: r0, r1, r2, and r3. Subsequent integer arguments are placed on the full descending
stack, ascending in memory as in Figure 5.1. Function return integer values are passed in r0. This
description covers only integer or pointer arguments. Two-word arguments such as long or double
are passed in a pair of consecutive argument registers and returned in r0, r1. The compiler may pass
structures in registers or by reference according to command line compiler options. The first point
to note about the procedure call standard is the four-register rule. Functions with four or fewer
arguments are far more efficient to call than functions with five or more arguments. For functions
with four or fewer arguments, the compiler can pass all the arguments in registers. For functions
with more arguments, both the caller and callee must access the stack for some arguments. Note
that for C++ the first argument to an object method is the this pointer. This argument is implicit
and additional to the explicit arguments.
If your C function needs more than four arguments, or your C++ method more than three explicit
arguments, then it is almost always more efficient to use structures. Group related arguments into
structures, and pass a structure pointer rather than multiple arguments. Which arguments are
related will depend on the structure of your software.
The next example illustrates the benefits of using a structure pointer. First we show a typical
F IGURE 2.2: ATPCS Argument passing
routine to insert N bytes from array data into a queue. We implement the queue using a cyclic
buffer with start address Q start (inclusive) and end address Q end (exclusive).
char *queue bytes v1(
char *Q start, /* Queue buffer start address */
char *Q end, /* Queue buffer end address */
char *Q ptr, /* Current queue pointer position */
char *data, /* Data to insert into the queue */
unsigned int N) /* Number of bytes to insert */
{
do {
*(Q ptr++) = *(data++); if (Q ptr == Q end) {
Q ptr = Q start; }
}
while (–N); return Q ptr; }
This compiles to
queue bytes v1 STR r14,[r13,#-4]; save lr on the stack
LDR r12,[r13,#4] ; r12 = N
queue v1 loop
LDRB r14,[r3],#1 ; r14 = *(data++)
STRB r14,[r2],#1 ; *(Q ptr++) = r14
CMP r2,r1 ; if (Q ptr == Q end)
MOVEQ r2,r0 ; Q ptr = Q start;
SUBS r12,r12,#1 ; –N and set flags
BNE queue v1 loop ; if (N!=0) goto loop
MOV r0,r2 ; r0 = Q ptr
LDR pc,[r13],#4 ; return r0
Compare this with a more structured approach using three function arguments.
2.3.1 POINTER ALIASING:
Two pointers are said to alias when they point to the same address. If you write to one pointer,
it will affect the value you read from the other pointer. In a function, the compiler often doesn’t
know which pointers can alias and which pointers can’t. The compiler must be very pessimistic
and assume that any write to a pointer may affect the value read from any other pointer, which can
significantly reduce code efficiency.
2.3.2 UNALIGNED DATA AND ENDIANNESS:
Unaligned data and endianness are two issues that can complicate memory accesses and portabil-
ity. In computing endianness is the ordering or sequencing of bytes of a word of digital data in
computer memory storage or during transmission. A big-endian system stores the most significant
byte of a word at the smallest memory address and the least significant byte at the largest memory
address.
A memory access is said to be aligned when the data being accessed is n bytes long and the datum
address is n-byte aligned. A memory pointer that refers to primitive data that is n bytes long is
said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is
said to be unaligned.
2.3.3 INLINE FUNCTIONS AND INLINE ASSEMBLY:
Generally the inline term is used to instruct the compiler to insert the code of a function into the
code of its caller at the point where the actual call is made. Such functions are called ”inline
functions”.It is just a set of assembly instructions written as inline functions.
2.4 EMBEDDED SYSTEMS PROGRAMMING IN C:
The embedded firmware is responsible for controlling the various peripherals of the embedded
hard-ware and generating response in accordance with the functional requirements mentioned in
the requirements for the particular embedded product.
Firmware is considered as the master brain of the embedded system.
Imparting intelligence to an embedded system is a onetime process and it can happen at any stage,
it can be immediately after the fabrication of the embedded hardware or at a later stage.
Whenever the conventional ’C’ Language and its extensions are used for programming embedded
systems, it is referred as ’Embedded C’ programming. Programming in ’Embedded C’ is quite
different from conventional Desktop application development using ’C’ language for a particular
OS platform. Desktop computers contain working memory in the range of Megabytes (Nowadays
Giga bytes) and storage memory in the range of Giga bytes. For a desktop application developer,
the resources available are surplus in quantity and they can be very lavish in the usage of RAM
and ROM and no restrictions are imposed at all. This is not the case for embedded application
developers.
Almost all embedded systems are limited in both storage and working memory resources. Em-
bedded application developers should be aware of this fact and should develop applications in the
best possible way which optimizes the code memory and working memory usage as well as per-
formance. In other words, the hands of an embedded application developer are always tied up in
the memory usage context.
2.4.1 ’C’ v/s. ’Embedded C’:
’C’ is a well structured, well defined and standardized general purpose programming language
with extensive bit manipulation support.
’C’ offers a combination of the features of high level language and assembly and helps in hard-
ware access programming (system level programming) as well as business package developments
(Application developments like pay roll systems, banking applications, etc).
The conventional ’C’ language follows ANSI(American National Standards Institute) standard and
it incorporates various library files for different operating systems.
A platform (operating system) specific application, known as, compiler is used for the conversion
of programs written in ’C’ to the target processor (on which the OS is running) specific binary
files. Hence it is a platform specific development.
Embedded ’C’ can be considered as a subset of conventional ’C’ language. Embedded ’C’ sup-
ports all ’C’ instructions and incorporates a few target processor specific functions/instructions.
It should be noted that the standard ANSI ’C’ library implementation is always tailored to the
target processor/controller library files in Embedded ’C’.
A software program called ’Cross-compiler’ is used for the conversion of programs written in
Embedded ’C’ to target processor/controller specific instructions (machine language).
2.4.2 Compiler vs. Cross-Compiler:
Compiler is a software tool that converts a source code written in a high level language on top
of a particular operating system running on a specific target processor architecture (e.g. Intel
x86/Pentium).Here the operating system, the compiler program and the application making use
of the source code run on the same target processor. The source code is converted to the target
processor specific machine instructions.
The development is platform specific (OS as well as target processor on which the OS is running).
Compilers are generally termed as ’Native Compilers’. A native compiler generates machine code
for the same machine (processor) on which it is running.
Cross-compilers are the software tools used in cross-platform development applications. In cross-
platform development, the compiler running on a particular target processor/OS converts the
source code to machine code for a target.
Embedded system development is a typical example for cross-platform development where em-
bedded firmware is developed on a machine with Intel/AMD or any other target processors and the
same is converted into machine code for any other target processor architecture (e.g. 8051, PIC,
ARM, etc).
Keil C51 is an example for cross-compiler. The term ’Compiler’ is used interchangeably with
’Cross-compiler’ in embedded firmware applications. Whenever you see the term ’Compiler’
related to any embedded firmware application, please understand that it is referring to the cross-
compiler.
F IGURE 2.3: Compiler vs. Cross-Compiler

2.4.3 Using ‘C’ in ‘Embedded C’:
Let us brush up whatever we learned in conventional ’C’ programming. Remember we will only
go through the peripheral aspects and will not go in deep.
Keywords and Identifiers:
Keywords are the reserved names used by the ’C’ language. All keywords have a fixed meaning in
the ’C’ language context and they are not allowed for programmers for naming their own variables
or functions. ANSI ’C’ supports 32 keywords and they are listed below.
All ’C’ supported keywords should be written in ’lowercase’ letters.
C Keywords are predefined, reserved words used in programming that have special meanings to the
compiler. Identifiers are user defined names and labels. Identifiers can contain letters of English
F IGURE 2.4: C Keywords
alphabet (both upper and lower case) and numbers. The starting character of an identifier should
be a letter. The only special character allowed in identifier is underscore ( ). Ex: Root, getchar,
sin, x 1, x1, If
Data Types:
Data type represents the type of data held by a variable. The various data types supported, their
storage space (bits) and storage capacity for ’C’ language are tabulated below.
F IGURE 2.5: Data Types
2.4.4 Arithmetic and Relational Operations:
The list of arithmetic operations supported by ‘C’ are listed below.

Logical Operations: Logical operations are usually performed for decision making and program
F IGURE 2.6: Arithmetic Operations
control transfer.
F IGURE 2.7: Logical Operations
2.4.5 Looping Instructions:
Looping instructions are used for executing a particular block of code repeatedly till a condition
is met or wait till an event is fired. Embedded programming often uses the looping instructions
for checking the status of certain I/O ports, registers, etc. and also for producing delays. Certain
devices allow write/read operations to and from some registers of the device only when the device
is ready and the device ready is normally indicated by a status register or by setting/clearing certain
bits of status registers. Hence the program should keep on reading the status register till the device
ready indication comes. The reading operation forms a loop. The looping instructions supported
by are listed below.
F IGURE 2.8: Looping Instructions
2.4.6 Arrays and Pointers:
Array is a collection of related elements (data types).

Arrays are usually declared with data type of array, name of the array and the number of related
elements to be placed in the array.
For example the following array declaration declares a character array with name ‘arr’ and reserves
space for 5 character elements in the memory as below figure. The elements of an array are
F IGURE 2.9: char arr [5]
accessed by using the array index or subscript. The index of the first element is ’0’. For the above
example the first element is accessed by arr[0], second element by arr[1], and so on. In the above
example, the array starts at memory location 0x8000 (arbitrary value taken for illustration) and the
address of the first element is 0x8000.
The ‘address of operator (&) returns the address of the memory location where the variable is
stored. Hence&arr[0] will return 0x8000 and &arr[1] will return 0x8001, etc.. The name of the
array itself with no index (subscript) always returns the address of the first element. If we examine
the first element arr[0] of the above array, we can see that the variable arr[0] is allocated a memory
location 0x8000 and the contents of that memory location holds the value for arr[0].
2.4.7 Pointers:
Pointer is a flexible at the same time most dangerous feature, capable of creating potential damages
leading to firmware crash, if not used properly.
Pointer is a memory pointing based technique for variable access and modification. Pointers are
very helpful in
1. Accessing and modifying variables
2. Increasing speed of execution
3. Accessing contents within a block of memory
4. Passing variables to functions by eliminating the use of a local copy of variables
5. Dynamic memory.
2.5 BINDING AND RUNNING EMBEDDED C PROGRAM IN KEIL

IDE:
Embedded system means some combination of computer hardware and programmable software
which is specially designed for a particular task like displaying message on LCD. If you are still
wondering about an embedded system, just take a look at these circuit applications using 8051
microcontroller. You can call these applications embedded systems as it involves hardware (8051
microcontroller) and software (the code written in assembly language).
Some real life examples of embedded systems may involve ticketing machines, vending machines,
temperature controlling unit in air conditioners etc. Microcontrollers are nothing without a Pro-
gram in it.
One of the important part in making an embedded system is loading the software/program we
develop into the microcontroller. Usually it is called “burning software” into the controller. Be-
fore “burning a program” into a controller, we must do certain prerequisite operations with the
program. This includes writing the program in assembly language or C language in a text editor
like notepad, compiling the program in a compiler and finally generating the hex code from the
compiled program. Earlier people used different softwares/applications for all these 3 tasks. Writ-
ing was done in a text editor like notepad/WordPad, compiling was done using a separate software
(probably a dedicated compiler for a particular controller like 8051), converting the assembly code
to hex code was done using another software etc. It takes lot of time and work to do all these sepa-
rately, especially when the task involves lots of error debugging and reworking on the source code.
textbfKeil MicroVision is free software which solves many of the pain points for an embedded
program developer. This software is an integrated development environment (IDE), which inte-
grated a text editor to write programs, a compiler and it will convert your source code to hex files
too. Here is simple guide to start working with Keil uVision which can be used for
• Writing programs in C/C++ or Assembly language
• Compiling and Assembling Programs
• Debugging program
• Creating Hex and Axf
• Testing your program without Available real Hardware (Simulator Mode)
This is simple guide on Keil uVision 4 though also applicable on previous versions also.
These are the simple steps to get off the mark your inning
Step 1: After opening Keil uV4, Go to Project tab and Create new uVision project Now Select
new folder and give name to Project.
F IGURE 2.10: Create new uVision project

Step 2: After Creating project now Select your device model. Example.NXP-LPC2148 [You can
change it later from project window.]
F IGURE 2.11: Select your device model
Step 3: so now your project is created and Message window will appear to add startup file of your
Device click on Yes so it will be added to your project folder
F IGURE 2.12: Message window
Step 4: Now go to File and create new file and save it with .C extension if you will write program
in C language or save with .asm for assembly language. i.e., Led.c
F IGURE 2.13: create new file and save it with .C
Step 5: Now write your program and save it again. You can try example given at end of this
tutorial.
Step 6: After that on left you see project window [if it’s not there. . . .go to View tab and click on
project window] Now come on Project window.
F IGURE 2.14: Project window
Right click on target and click on options for target Here you can change your device also.
Click output tab here & check create Hex file if you want to generate hex file Now click on ok so
it will save changes
F IGURE 2.15: Right click on target and click on options for target
F IGURE 2.16: create Hex file
Step 7: Now Expand target and you will see source group Right click on group and click on Add
files to source group
F IGURE 2.17: Add files to source group
Now add your program file which you have written in C/assembly.
You can see program file added under source group.
Step 8: Now Click on Build target.You can find it under Project tab or in toolbar.It can also be
done by pressing F7 key.
Step 9: you can see Status of your program in Build output window
Now you are done with your program.

F IGURE 2.18: Build target
F IGURE 2.19: Build target
2.5.1 BASIC TECHNIQUES FOR READING FROM PORT PINS:
As we saw in Chapter 3, control of the 8051 ports is carried out using 8-bit latches (SFRs). We
can send some data to Port 1 as follows:
sfr P1 = 0x90; // Usually in header file
P1 = 0x0F; // Write 00001111 to Port 1
In exactly the same way, we can read from Port 1 as follows:
unsigned char Port data;
P1 = 0xFF; // Set the port to ’read mode’
Port data = P1; // Read from the port

After the 8051 microcontroller is reset, the port latches all have the value 0xFF (11111111 in bi-
nary): that is, all the port-pin latches are set to values of ‘1’. It is tempting to assume that writing
data to the port is therefore unnecessary, and that we can get away with the following version:
// Assume nothing written to port since reset
// – DANGEROUS
Port data = P1;
The problem with this code is that, in simple test programs it works: this can lull the developer
into a false sense of security. If, at a later date, someone modifies the program to include a routine
for writing to all or part of the same port, this code will not generally work as required:
P1 = 0x00;
// Assumes nothing written to port since reset // – WON’T WORK Port data = P1;
In most cases, initialization functions are used to set the port pins to a known state at the start
of the program. Where this is not possible, it is safer to always write ‘1’ to any port pin before
reading from it. ;Toggle all bits of continuously.
F IGURE 2.20: pin diagram of 8051 microcontroller
MOV A,#55
BACK: MOV P2,A
ACALL DELAY
CPL A ;complement(inv) reg.A
SJMP BACK
Reading and writing bits:
•Demonstrated how to read from or write to an entire port. However, suppose we have a switch
connected to Pin 1.1 and an LED connected to Pin 2.1.
•We might also have input and output devices connected to the other pins on Port 1.
•These pins may be used by totally different parts of the same system, and the code to access them
may be produced by other team members, or other companies.
•It is therefore essential that we are able to read-from or write-to individual port pins without al-
tering the values of other pins on the same port.
•We provided a simple example to illustrates how we can read from Pin 1.1, and write to Pin 2.1,
without disrupting any other pins on this (or any other) port.
#include¡reg51.h¿
F IGURE 2.21: Reading and writing bits
sbit Led = P21̂; //pin connected to toggle Led

sbit Switch =P11̂; //Pin connected to toggle led
void main(void)
{
Led = 0; //configuring as output pin
Switch = 1; //Configuring as input pin
while(1) //Continuous monitor the status of the switch.
{
if(Switch == 0)
{
Led =1; //Led On
}
else
{
Led =0; //Led Off
}
}
return 0;
}
2.6 SWITCH BOUNCE:
In an ideal world, this change in voltage obtained by connecting a switch to the port pin of an 8051
microcontroller would take the form illustrated in Figure 4.8 (top). In practice, all mechanical
switch contacts bounce (that is, turn on and off, repeatedly, for a short period of time) after the
switch is closed or opened. As a result, the actual input waveform looks more like that shown
in Figure 4.8 (bottom). Usually, switches bounce for less than 20 ms: however large mechanical
switches exhibit bounce behaviour for 50 ms or more.
When you turn on the lights in your home or office with a mechanical switch, the switches will
bounce. As far as humans are concerned, this bounce is imperceptible.
However, as far as the microcontroller is concerned, each ‘bounce’ is equivalent to one press and
release of an ‘ideal’ switch. Without appropriate software design, this can give rise to a number of
problems, not least:
•Rather than reading ‘A’ from a keypad, we may read ‘AAAAA’.
• Counting the number of times that a switch is pressed becomes extremely difficult.
• If a switch is depressed once, and then released some time later, the ‘bounce’ may make it appear
as if the switch has been pressed again (at the time of release).
F IGURE 2.22: SWITCH BOUNCE
2.7 APPLICATIONS:
2.7.1 LED INTERFACING WITH 8051 TO A SINGLE PIN:
F IGURE 2.23: LED INTERFACING WITH 8051 TO A SINGLE PIN
Program:
#include¡reg51.h¿ // special function register declarations
sbit LED = P20̂; // Defining LED pin
void Delay(void); // Function prototype declaration
void main (void)

{
while(1) // infinite loop
{
LED = 0; // LED ON
Delay();
LED = 1; // LED OFF
Delay();
}
}
void Delay(void)
{
int j;
int i;
for(i=0;i¡10;i++)
{
for(j=0;j¡10000;j++)
{
}
}
}
2.7.2 LED’s interfacing with Port, P1 of 8051:
Program: #include¡REG51.H¿
#define LEDPORT P1
void delay(unsigned int);
void main(void)
{
LEDPORT =0x00;
while(1)
{
LEDPORT = 0X00;
delay(250);
LEDPORT = 0xff;
delay(250);
}
}
void delay(unsigned int itime)
{
unsigned int i,j;
for(i=0;i¡itime;i++)
{
for(j=0;j¡250;j++);
}
}
F IGURE 2.24: LED’s interfacing with Port, P1 of 8051

2.7.3 4X4 MATRIX KEYPAD INTERFACING WITH 8051 MICROCONTROLLER:
Keypads/Keyboards are widely used input devices being used in various electronics and embedded
projects. They are used to take inputs in the form of numbers and alphabets, and feed the same into
system for further processing. In this tutorial we are going to interface a 4x4 matrix keypad/Key-
board with 8051 microcontroller.
Before we interface the keypad with microcontroller, first we need to understand how it works.
Matrix keypad consists of set of Push buttons, which are interconnected. Like in our case we are
using 4X4 matrix keypad, in which there are 4 push buttons in each of four rows. And the terminals
of the push buttons are connected according to diagram. In first row, one terminal of all the 4 push
buttons are connected together and another terminal of 4 push buttons are representing each of 4
columns, same goes for each row. So we are getting 8 terminals to connect with a microcontroller.
F IGURE 2.25: 4X4 MATRIX KEYPAD INTERFACING WITH 8051 MICROCONTROLLER
2.7.4 Interfacing keypad with 8051 microcontroller (P89V51RD2):
As shown in above circuit diagram, to interface Keypad, we need to connect 8 terminals of the
keypad to any port (8 pins) of the microcontroller. Like we have connected keypad terminals to
Port 1 of 8051. Whenever any button is pressed we need to get the location of the button, means
the corresponding ROW and COLUMN no. Once we get the location of the button, we can print
F IGURE 2.26: Interfacing keypad with 8051 microcontroller (P89V51RD2)
the character accordingly. Now the question is how to get the location of the pressed button? I am
going to explain this in below steps and also want you to look at the code:
1. First we have made all the Rows to Logic level 0 and all the columns to Logic level 1.
2. Whenever we press a button, column and row corresponding to that button gets shorted and
makes the corresponding column to logic level 0. Because that column becomes connected (shorted)
to the row, which is at Logic level 0. So we get the column no. See main() function.
3. Now we need to find the Row no., so we have created four functions corresponding to each
column. Like if any button of column one is pressed, we call function row finder1(), to find the
row no.
4. In row finder1() function, we reversed the logic levels, means now all the Rows are 1 and
columns are 0. Now Row of the pressed button should be 0 because it has become connected
(shorted) to the column whose button is pressed, and all the columns are at 0 logic. So we have
scanned all rows for 0.
5. So whenever we find the Row at logic 0, means that is the row of pressed button. So now
we have column no (got in step 2) and row no., and we can print no. of that button using lcd data
function.
F IGURE 2.27: FINDING COLUMN NUMBER
F IGURE 2.28: FINDING ROW NUMBER
Same procedure follows for every button press, and we are using while(1), to continuously check,
whether button is pressed or not.
Code:
#include¡reg51.h¿
#define display port P2 //Data pins connected to port 2 on microcontroller
sbit rs = P30̂; //RS pin connected to pin 2 of port 3
sbit rw = P31̂; // RW pin connected to pin 3 of port 3
sbit e = P32̂; //E pin connected to pin 4 of port 3
sbit C4 = P10̂; // Connecting keypad to Port 1
sbit C3 = P11̂;
sbit C2 = P12̂;
sbit C1 = P13̂;
sbit R4 = P14̂;
sbit R3 = P15̂;
sbit R2 = P16̂;
sbit R1 = P17̂;
void msdelay(unsigned int time) // Function for creating delay in milliseconds.
{
unsigned i,j ;
for(i=0;i¡time;i++)
for(j=0;j¡1275;j++);
}
void lcd cmd(unsigned char command) //Function to send command instruction to LCD
{
display port = command;
rs= 0;
rw=0;
e=1;
msdelay(1);
e=0;
}
void lcd data(unsigned char disp data) //Function to send display data to LCD
{
display port = disp data;
rs= 1;
rw=0;
e=1;
msdelay(1);
e=0;
}
void lcd init() //Function to prepare the LCD and get it ready
{
lcd cmd(0x38); // for using 2 lines and 5X7 matrix of LCD
msdelay(10);
lcd cmd(0x0F); // turn display ON, cursor blinking

msdelay(10);
lcd cmd(0x01); //clear screen msdelay(10);
lcd cmd(0x81); // bring cursor to position 1 of line 1
msdelay(10);
}
void row finder1() //Function for finding the row for column 1
{
R1=R2=R3=R4=1;
C1=C2=C3=C4=0;
if(R1==0)
lcd data(’7’);
if(R2==0)
lcd data(’4’);
if(R3==0)
lcd data(’1’);
if(R4==0)
lcd data(’N’);
}
{
R1=R2=R3=R4=1;
C1=C2=C3=C4=0;
if(R1==0)
lcd data(’8’);
if(R2==0)
lcd data(’5’);
if(R3==0)
lcd data(’2’);
if(R4==0)
lcd data(’0’);
}
{
R1=R2=R3=R4=1;
C1=C2=C3=C4=0;
if(R1==0)
lcd data(’9’);
if(R2==0)
lcd data(’6’);
if(R3==0)
lcd data(’3’);
if(R4==0)
lcd data(’=’);
}
{
R1=R2=R3=R4=1;
C1=C2=C3=C4=0;
if(R1==0)
lcd data(’if(R2==0)
lcd data(’*’);
if(R3==0)
lcd data(’-’);
if(R4==0)
lcd data(’+’);
}
void main()
{
lcd init();
while(1)
{
msdelay(30);
C1=C2=C3=C4=1;
R1=R2=R3=R4=0;
if(C1==0)
row finder1();
else if(C2==0)
row finder2();
else if(C3==0)
row finder3();
else if(C4==0)
row finder4();
}
}
2.7.5 7 SEGMENT DISPLAY INTERFACING WITH 8051 MICROCONTROLLER:
F IGURE 2.29: 7 SEGMENT DISPLAY INTERFACING WITH 8051 MICROCONTROLLER
This is how to interface a seven segment LED display to an 8051 microcontroller. 7 segment LED
display is very popular and it can display digits from 0 to 9 and quite a few characters. Knowledge
about how to interface a seven segment display to a micro controller is very essential in designing
embedded systems. Seven segment displays are of two types, common cathode and common an-
ode. In common cathode type , the cathode of all LEDs are tied together to a single terminal which
is usually labeled as ‘com‘ and the anode of all LEDs are left alone as individual pins labeled as a,
b, c, d, e, f, g & h (or dot) .
In common anode type, the anodes of all LEDs are tied together as a single terminal and cathodes
are left alone as individual pins.
Program:
F IGURE 2.30: 7 SEGMENT DISPLAY INTERFACING DISCRIPTION
/*Program to interface seven segment display unit.*/

#include ¡REG51.H¿
#define LEDPORT P0
#define ZERO 0x3f
#define ONE 0x06
#define TWO 0x5b
#define THREE 0x4f
#define FOUR 0x66
#define FIVE 0x6d
#define SIX 0x7d
#define SEVEN 0x07
#define EIGHT 0x7f
#define NINE 0x6f
#define TEN 0x77
#define ELEVEN 0x7c
#define TWELVE 0x39
#define THIRTEEN 0x5e
#define FOURTEEN 0x79
#define FIFTEEN 0x71
void Delay(void);
void main (void)
{
while(1)
{
LEDPORT = ZERO;
Delay();
LEDPORT = ONE;
Delay();
LEDPORT = TWO;
Delay();
LEDPORT = THREE;
Delay();
LEDPORT = FOUR;
Delay();
LEDPORT = FIVE;
Delay();
LEDPORT = SIX;
Delay();
LEDPORT = SEVEN;
Delay();
LEDPORT = FOURTEEN;
Delay();
LEDPORT = FIFTEEN;
Delay();
}
}
void Delay(void)
{
int j; int i;
for(i=0;i¡30;i++)
{
for(j=0;j¡10000;j++)
{
}
}
}
2.7.6 LCD DISPLAY INTERFACING WITH 8051 MICROCONTROLLER:
In this, we will have brief discussion on how to interface 16×2 LCD module to P89V51RD2,
which is an 8051 family microcontroller. We use LCD display for the displaying messages in a
more interactive way to operate the system or displaying error messages etc. Interfacing 16×2
LCD with 8051 microcontroller is very easy if you understanding the working of LCD. 16×2
Liquid Crystal Display which will display the 32 characters at a time in two rows (16 characters
in one row). Each character in the display is of size 5×7 pixel matrix.
F IGURE 2.31: LCD Display
F IGURE 2.32: Pin Desciption
Follow these simple steps for displaying a character or data

E=1; enable pin should be high
RS=1; Register select should be high
R/W=0; Read/Write pin should be low.
To send a command to the LCD just follows these steps:
E=1; enable pin should be high
F IGURE 2.33: Pin Desciption
F IGURE 2.34: LCD DISPLAY INTERFACING WITH 8051 MICROCONTROLLER
RS=0; Register select should be low

R/W=0; Read/Write pin should be low.
Program:
#include¡reg51.h¿
sbit rs=P30̂;
sbit rw=P31̂;
sbit en=P32̂;
void lcdcmd(unsigned char);
void lcddat (unsigned char);
void delay();
void main()
{
P2=0x00;
while(1)
{
lcdcmd(0x38);
delay();
lcdcmd(0x01);
delay();
lcdcmd(0x10);
delay();
lcdcmd(0x0c);
delay();
lcdcmd(0x81);
delay();
lcddat(’I’);
delay();
lcddat(’A’);
delay();
lcddat(’R’);
delay();
lcddat(’E’);
delay();
}
}
void lcdcmd(unsigned char val)
{
P2=val;
rs=0;
rw=0;
en=1;
delay();
en=0;
}
void lcddat(unsigned char val)
{
P2=val;
rs=1;
rw=0;
en=1;
delay();
en=0;
}
void delay()
{
unsigned int i;
for(i=0;i¡6000;i++);
}
2.8 ADC (ADC0808) INTERFACING WITH 8051 MICROCONTROLLER:
ADC0808/ADC0809 is an 8 channel 8-bit analog to digital converter. Unlike ADC0804 which has
one Analog channel, this ADC has 8 multiplexed analog input channels. This tutorial will provide
you basic information regarding this ADC, testing in free run mode and interfacing example with
8051 with sample program in C and assembly.
IN0-IN7: Analog Input channels
D0-D7: Data Lines
A, B, C: Analog Channel select lines; A is LSB and C is MSB
F IGURE 2.35: ADC (ADC0808) INTERFACING WITH 8051 MICROCONTROLLER
OE: Output enable signal

ALE: Address Latch Enable
EOC: End of Conversion signal
Vref+/Vref-: Differential Reference voltage input
Clock: External ADC clock input
Normally analogue-to-digital converter (ADC) needs interfacing through a microprocessor to con-
vert analogue data into digital format. This requires hardware and necessary software, resulting
in increased complexity and hence the total cost. The circuit of A-to-D converter shown here is
configured around ADC 0808, avoiding the use of a microprocessor. The ADC 0808 is an 8-bit
A-to-D converter, having data lines D0-D7. It works on the principle of successive approximation.
It has a total of eight analogue input channels, out of which any one can be selected using address
lines A, B and C. Here, in this case, input channel IN0 is selected by grounding A, B and C address
lines.
Usually the control signals EOC (end of conversion), SC (start conversion), ALE (address latch
enable) and OE (output enable) are interfaced by means of a microprocessor. However, the circuit
shown here is built to operate in its continuous mode without using any microprocessor. Therefore
the input control signals ALE and OE, being active-high, are tied to Vcc (+5 volts). The input con-
trol signal SC, being active-low, initiates start of conversion at falling edge of the pulse, whereas
the output signal EOC becomes high after completion of digitization. This EOC output is coupled
to SC input, where falling edge of EOC output acts as SC input to direct the ADC to start the
conversion.
As the conversion starts, EOC signal goes high. At next clock pulse EOC output again goes low,
and hence SC is enabled to start the next conversion. Thus, it provides continuous 8-bit digital
output corresponding to instantaneous value of analogue input. The maximum level of analogue
input voltage should be appropriately scaled down below positive reference (+5V) level.
The ADC 0808 IC requires clock signal of typically 550 kHz, which can be easily derived from
an Astable multivibrator, constructed using 7404 inverter gates. In order to visualize the digi-
tal output, the row of eight LEDs (LED1 through LED8) have been used, where in each LED is
connected to respective data lines D0 through D7. Since ADC works in the continuous mode, it
displays digital output as soon as analogue input is applied. The decimal equivalent digital output
value D for a given analogue input voltage Vin can be calculated from the relationship.
Program:
F IGURE 2.36: Clock input
F IGURE 2.37: INTERFACING WITH 8051 MICROCONTROLLER

#include ¡reg51.h¿
#define ALE P3 4
#define OE P3 7
#define START P3 5
#define EOC P3 6
#define SEL A P3 1
#define SEL B P3 2
#define SEL C P3 3
#define ADC DATA P1
void main()
{
unsigned char adc data;
/* Data port to input */
ADC DATA = 0xFF;
EOC = 1; /* EOC as input */
ALE = OE = START = 0;
while (1)
{
/* Select channel 1 */
SEL A = 1; /* LSB */
SEL B = 0;
SEL C = 0; /* MSB */
/* Latch channel select/address */
ALE = 1; /* Start conversion */
START = 1;
ALE = 0;
START = 0;
/* Wait for end of conversion */
while (EOC == 1);
while (EOC == 0);
/* Assert Read signal */
OE = 1;
/* Read Data */
adc data = ADC DATA;
OE = 0;
/* Now adc data is stored */
/* start over for next conversion */
}
}
2.9 DAC INTERFACING WITH 8051 MICROCONTROLLER:
This section will show how to interface a DAC (digital-to-analog converter) to the 8051. Then we
demonstrate how to generate a sine wave on the scope using the DAC.
2.9.1 Digital-to-analog (DAC) converter
The digital-to-analog converter (DAC) is a device widely used to convert digital pulses to analog
signals. In this section we discuss the basics of interfacing a DAC to the 8051.
Recall from your digital electronics book the two methods of creating a DAC:
Binary weighted.
R/2R ladder.
The vast majority of integrated circuit DACs, including the MC1408 (DAC0808) used in this sec-
tion use the R/2R method since it can achieve a much higher degree of precision. The first criterion
for judging a DAC is its resolution, which is a function of the number of binary inputs. The com-
mon ones are 8, 10, and 12 bits. The number of data bit inputs decides the resolution of the DAC
since the number of analog output levels is equal to 2, where n is the number of data bit inputs.
Therefore, an 8-input DAC such as the DAC0808 provides 256 discrete voltage (or current) levels
of output. Similarly, the 12-bit DAC provides 4096 discrete voltage levels. There are also 16-bit
DACs, but they are more expensive.
2.9.2 MC1408 DAC (or DAC0808)
In the MC1408 (or DAC0808), the digital inputs are converted to current (Iout), and by connecting
a resistor to the Iout pin, we convert the result to voltage.
The total current provided by the Iout pin is a function of the binary numbers at the DO – D7
inputs of the DAC0808 and the reference current (Iref), and is as follows:
I out=I ref (D7/2+D6/4+D5/8+D4/16+D3/32+D2/64+D1/128+D0/256)
Where DO is the LSB, D7 is the MSB for the inputs, and Iref is the input current that must be
applied to pin 14. The Iref current is generally set to 2.0 mA. Figure shows the generation of
current reference (setting Iref = 2 mA) by using the standard 5-V power supply and IK and 1.5K-
ohm standard resistors. Some DACs also use the zener diode (LM336), which overcomes any
fluctuation associated
F IGURE 2.38: 8051 Connection to DAC808

2.9.3 Converting lout to voltage in DAC0808
Ideally we connect the output pin Iout to a resistor, convert this current to voltage, and monitor
the output on the scope. In real life, however, this can cause inaccuracy since the input resistance
of the load where it is connected will also affect the output voltage. For this reason, the Iref
current output is isolated by connecting it to an op-amp such as the 741 with Rf = 5K ohms for the
feedback resistor. Assuming that R = 5K ohms, by changing the binary input, the output voltage
change
2.9.4 Generating a sine wave:
To generate a sine wave, we first need a table whose values represent the magnitude of the sine
of angles between 0 and 360 degrees. The values for the sine function vary from -1.0 to +1.0
for 0- to 360-degree angles. Therefore, the table values are integer numbers representing the
voltage magnitude for the sine of theta. This method ensures that only integer numbers are output
to the DAC by the 8051 microcontroller. Table shows the angles, the sine values, the voltage
magnitudes, and the integer values representing the voltage magnitude for each angle (with 30-
degree increments). To generate figure shown below , we assumed the full-scale voltage of 10 V
for DAC output. Full-scale output of the DAC is achieved when all the data inputs of the DAC are
high. Therefore, to achieve the full-scale 10 V output, we use the following equation.
Program:
#include ¡reg51.h¿
sfr DACDATA = Pl;
void main ()
{
unsigned char WAVEVALUE [12]=128,192,238,255, 238,192,128,64, 17,0,17,64 ;
unsigned char x ,
while (1)
{
F IGURE 2.39: Generating a sine wave
for(x=0;x¡12;x++)
{
DACDATA = WAVEVALUE[x];
}
}
}
2.10 MULTIPLE INTERRUPTS IN 8051 MICROCONTROLLER:
2.10.1 Interrupts vs. polling:
A single microcontroller can serve several devices. There are two ways to do that: interrupts or
polling. In the interrupt method, whenever any device needs its service the device notifies the
microcontroller by sending it an interrupt signal. Upon receiving an interrupt signal, the micro-
controller interrupts whatever it is doing and serves the device.
F IGURE 2.40: Angle vs. Voltage Magnitude for Sine Wave
The program associated with the interrupt is called the interrupt service routine (ISR) or interrupt
handler.
In polling, the microcontroller continuously monitors the status of a given device; when the status
condition is met, it performs the service. After that, it moves on to monitor the next device until
each one is serviced. Although polling can monitor the status of several devices and serve each of
them as certain conditions are met, it is not an efficient use of the microcontroller.
The advantage of interrupts is that the microcontroller can serve many devices (not all at the same
time, of course); each device can get the attention of the microcontroller based on the priority
assigned to it.
The polling method cannot assign priority since it checks all devices in a round robin fashion.
2.10.2 SIX INTERRUPTS IN THE 8051 MICROCONTROLLER:
In reality, only five interrupts are available to the user in the 8051, but many manufacturers data
sheets state that there are six interrupts since they include reset. The six interrupts in the 8051 are
allocated as follows.
1. Reset. When the reset pin is activated, the 8051 jumps to address location 0000. This is the
power-up reset.
2. Two interrupts are set aside for the timers: one for Timer 0 and one for Timer1. Memory loca-
tions 000BH and 001BH in the interrupt vector table belong to Timer 0 and Timer 1, respectively.
3. Two interrupts are set aside for hardware external hardware interrupts, Pin numbers 12 (P3.2)
and 13 (P3.3) in port 3 are for the external hardware interrupts INT 0 and INT 1, respectively.
These external interrupts are also referred to as EX 1 and EX 2. Memory locations 0003H and
0013H In the interrupt vector table are assigned to INT0 and INT1, respectively.
4. Serial communication has a single interrupt that belongs to both receive and transmit. The
interrupt vector table location 0023H belongs to this interrupt.
F IGURE 2.41: Interrupt Vector Table for the 8051
2.10.3 Enabling and Disabling an interrupt:
Upon reset, all interrupts are disabled (masked), meaning that none will be responded to by the
microcontroller if they are activated. The interrupts must be enabled by software in order for the
microcontroller to respond to them. There is a register called IE (interrupt enable) that is respon-
sible for enabling (unmasking) and disabling (masking) the interrupts.
Figure shows the IE register. Note that IE is a bit-addressable register. From figure notice that bit
D7 in the IE register is called EA (enable all). This must be set to 1 in order for the rest of the
register to take effect. D6 is unused. D5 is used by the 8052. The D4 bit is for the serial interrupt,
and so on.
Steps in enabling an interrupt: To enable an interrupts, we take the following steps:
1. Bit D7 of the IE register (EA) must be set to high to allow the rest of register to take the effect.
2. If EA =1, interrupts are enabled and will be responded to if their corresponding bits in IE are
high.
If EA=0, no interrupt will be responded to, even if the associated bit in the IE register is high.
2.11 IE (Interrupt Enable) Register:
• This register is responsible for enabling and disabling the interrupt.

• EA register is set to 1 for enabling interrupts and
• EA register is set to 0 for disabling the interrupts.
• Its bit sequence and their meanings are shown in the following figure.
F IGURE 2.42: Interrupt Enable) Register
F IGURE 2.43: Interrupt Enable) Register table

2.12 SERIAL COMMUNICATION PROGRAMMING:
Serial Communication can be

• Asynchronous
• Synchronous
2.12.1 Synchronous Communication:
Synchronous methods transfer a block of data (characters) at a time The events are referenced to a
clock
Example: SPI bus, I2C bus
Asynchronous Communication: Asynchronous methods transfer a single byte at a time There is
no clock. The bytes are separated by start and stop bits. Example: UART
Serial port programming in assembly
Since IBM PC/compatible computers are so widely used to communicate with 8051-based sys-
tems, serial communications of the 8051 with the COM port of the PC will be emphasized. To
allow data transfer between the PC and an 8051 system without any error, we must make sure that
the baud rate of the 8051 system matches the baud rate of the PCs COM port.
2.13 Baud rate in the 8051
The 8051 transfers and receives data serially at many different baud rates. Serial communications
of the 8051 is established with PC through the COM port. It must make sure that the baud rate of
the 8051 system matches the baud rate of the PC’s COM port/ any system to be interfaced. The
baud rate in the 8051 is programmable. This is done with the help of Timer. When used for serial
port, the frequency of timer tick is determined by (XTAL/12)/32 and 1 bit is transmitted for each
timer period (the time duration from timer start to timer expiry).
The Relationship between the crystal frequency and the baud rate in the 8051 is that the 8051
divides the crystal frequency by 12 to get the machine cycle frequency which is shown in figure1.
Here the oscillator is XTAL = 11.0592 MHz, the machine cycle frequency is 921.6 kHz. 8051’s
UART divides the machine cycle frequency of 921.6 kHz by 32 once more before it is used by
Timer 1 to set the baud rate. 921.6 kHz divided by 32 gives 28,800 Hz. Timer 1 must be pro-
grammed in mode 2, that is 8-bit, auto-reload. In serial communication if data transferred with a
baud rate of 9600 and XTAL used is 11.0592 then following is the steps followed to find the TH1
value to be loaded.
Clock frequency of timer clock: f = (11.0592 MHz / 12)/32 = 28,800Hz Time period of each
F IGURE 2.44: Calculation of baud rate
clock tick: T0 = 1/f = 1/28800

Duration of timer : n*T0 (n is the number of clock ticks) 9600 baud -¿duration of 1 symbol:
1/9600 1/9600 = n*T0 = n*1/28800
n = f/9600 = 28800/9600 = 3 -¿TH1 =-3
Similarly, for baud 2400
n = f/2400 = 12 -¿TH1 = -12
Example: set baud rate at 9600
MOV TMOD, #20H ; timer 1,mode 2(auto reload) MOV TH1, #-3 ; To set 9600 baud rate
SETB TR1; start timer 1
2.13.1 Baud rate selection
Baud rate is selected by timer1 and when Timer 1 is used to set the baud rate it must be pro-
grammed in mode 2 that is 8-bit, auto-reload. To get baud rates compatible with the PC, we must
load TH1 with the values shown below Baud rate can be increase by two ways-
F IGURE 2.45: Timer 1 THI register values for different baud rates
1. Increasing frequency of crystal

2. Change bit in PCON register
PCON
It is 8-bit register. When 8051 is powered up, SMOD is zero. By setting the SMOD, baud rate
can be doubled. If SMOD = 0 (which is its value on reset), the baud rate is 1/64 the oscillator
frequency. If SMOD = 1, the baud rate is 1/32 the oscillator frequency.
F IGURE 2.46: Comparison of Baud rate

Chapter 3
RTOS BASED EMBEDDED SYSTEM

DESIGN
Course Outcomes
CO 3 Demonstrate the principles of RTOS and the methods used for saving Understand
memory and power in real time environments.
3.1 Introduction
Operating System Basics:

The Operating System acts as a bridge between the user applications/tasks and the underlying
system resources through a set of system functionalities and services. OS manages the system
resources and makes them available to the user applications / tasks on a need basis.
101
Chapter 3. RTOS BASED EMBEDDED SYSTEM DESIGN 102
F IGURE 3.1: Architecture of Operating System
3.2 The Kernel

1. The kernel is the core of the operating system
2. It is responsible for managing the system resources and the communication among the hard-
ware and other system services
3. Kernel acts as the abstraction layer between system resources and user applications
4. Kernel contains a set of system libraries and services.
5. For a general purpose OS, the kernel contains different services like
(a) Process Management
(b) Primary Memory Management
(c) File System management
(d) I/O System (Device) Management
(e) Secondary Storage Management
(f) Protection
(g) Time management
(h) Interrupt Handling

Kernel Space and User Space:
1. The program code corresponding to the kernel applications/services are kept in a contiguous
area (OS dependent) of primary (working) memory and is protected from the un-authorized
access by user programs/applications.
2. The memory space at which the kernel code is located is known as ‘Kernel Space’.
3. All user applications are loaded to a specific area of primary memory and this memory area
is referred as ‘User Space’.
4. The partitioning of memory into kernel and user space is purely Operating System depen-
dent.
5. An operating system with virtual memory support, loads the user applications into its cor-
responding virtual memory space with demand paging technique. Most of the operating
systems keep the kernel application code in main memory and it is not swapped out into the
secondary memory.
Monolithic Kernel:
1. All kernel services run in the kernel space
2. All kernel modules run within the same memory space under a single kernel thread
3. The tight internal integration of kernel modules in monolithic kernel architecture allows the
effective utilization of the low-level features of the underlying system
4. The major drawback of monolithic kernel is that any error or failure in any one of the kernel
modules leads to the crashing of the entire kernel application.
5. LINUX, SOLARIS, MS-DOS kernels are examples of monolithic kernel

F IGURE 3.2: Monolithic Kernel Model
Microkernel:
The microkernel design incorporates only the essential set of Operating System services into the
kernel.
1. The rest of the Operating System services are implemented in programs known as ’Servers’
which runs in user space.
2. The kernel design is highly modular provides OS-neutral abstraction.
3. Memory management, process management, timer systems and interrupt handlers are ex-
amples of essential services, which forms the part of the microkernel.Examples for micro-
kernel: QNX, Minix 3 kernels.
Benefits of Microkernel:
1. Robustness: If a problem is encountered in any services in server can reconfigured and re-
started without the need for re-starting the entire OS.
2.Configurability: Any services, which run as ‘server’ application can be changed without need
to restart the whole system.
3.3 Types of Operating Systems
Depending on the type of kernel and kernel services, purpose and type of computing systems
where the OS is deployed and the responsiveness to applications, Operating Systems are classified
into
1. General Purpose Operating System (GPOS).
2. Real Time Purpose Operating System (RTOS).
General Purpose Operating System (GPOS):
1. Operating Systems, which are deployed in general computing systems
2. The kernel is more generalized and contains all the required services to execute generic
applications
3. Need not be deterministic in execution behavior
4. May inject random delays into application software and thus cause slow responsiveness of
an application at unexpected times
5. Usually deployed in computing systems where deterministic behavior is not an important

criterion
6. Personal Computer/Desktop system is a typical example for a system where GPOSs are
deployed.
7. Windows XP/MS-DOS etc are examples of General Purpose Operating System
Real Time Purpose Operating System (RTOS):
1. Operating Systems, which are deployed in embedded systems demanding real-time re-
sponse
2. Deterministic in execution behavior. Consumes only known amount of time for kernel ap-
plications
3. Implements scheduling policies for executing the highest priority task/application always
4. Implements policies and rules concerning time-critical allocation of a system’s resources
5. Windows CE, QNX, VxWorks , MicroC/OS-II etc are examples of Real Time Operating
Systems (RTOS)
The Real Time Kernel: The kernel of a Real Time Operating System is referred as Real Time
kernel. In complement to the conventional OS kernel, the Real Time kernel is highly specialized
and it contains only the minimal set of services required for running the user applications/tasks.
The basic functions of a Real Time kernel are
1. Task/Process management
2. Task/Process scheduling
3. Task/Process synchronization
4. Error/Exception handling
5. Memory Management
6. Interrupt handling
7. Time management
Real Time Kernel Task/Process Management: Deals with setting up the memory space for the
tasks, loading the task’s code into the memory space, allocating system resources, setting up a
Task Control Block (TCB) for the task and task/process termination/deletion. A Task Control
Block (TCB) is used for holding the information corresponding to a task. TCB usually contains
the following set of information
1. Task ID: Task Identification Number
2. Task State: The current state of the task. (E.g. State= ‘Ready’ for a task which is ready to
execute)
3. Task Type: Task type. Indicates what is the type for this task. The task can be a hard real
time or soft real time or background task.
4. Task Priority: Task priority (E.g. Task priority =1 for task with priority = 1)
5. Task Context Pointer: Context pointer. Pointer for context saving
6. Task Memory Pointers: Pointers to the code memory, data memory and stack memory for
the task
7. Task System Resource Pointers: Pointers to system resources (semaphores, mutex etc) used
by the task
8. Task Pointers: Pointers to other TCBs (TCBs for preceding, next and waiting tasks)
9. Other Parameters Other relevant task parameters
The parameters and implementation of the TCB is kernel dependent. The TCB parameters vary
across different kernels, based on the task management implementation
1. Task/Process Scheduling: Deals with sharing the CPU among various tasks/processes. A
kernel application called ‘Scheduler’ handles the task scheduling. Scheduler is nothing but
an algorithm implementation, which performs the efficient and optimal scheduling of tasks
to provide a deterministic behavior.
2. Task/Process Synchronization: Deals with synchronizing the concurrent access of a re-

source, which is shared across multiple tasks and the communication between various tasks.
3. Error/Exception handling: Deals with registering and handling the errors occurred/excep-
tions raised during the execution of tasks. Insufficient memory, timeouts, deadlocks, dead-
line missing, bus error, divide by zero, unknown instruction execution etc, are examples of
errors/exceptions. Errors/Exceptions can happen at the kernel level services or at task level.
Deadlock is an example for kernel level exception, whereas timeout is an example for a task
level exception. The OS kernel gives the information about the error in the form of a system
call (API).
Memory Management:
1. The memory management function of an RTOS kernel is slightly different compared to the
General Purpose Operating Systems. The memory allocation time increases depending on
the size of the block of memory needs to be allocated and the state of the allocated memory
block (initialized memory block consumes more allocation time than uninitialized memory
block)
2. Since predictable timing and deterministic behavior are the primary focus for an RTOS,
RTOS achieves this by compromising the effectiveness of memory allocation
3. RTOS generally uses ‘block’ based memory allocation technique, instead of the usual dy-
namic memory allocation techniques used by the GPOS.
4. RTOS kernel uses blocks of fixed size of dynamic memory and the block is allocated for a
task on a need basis. The blocks are stored in a ‘Free buffer Queue’.
5. Most of the RTOS kernels allow tasks to access any of the memory blocks without any
memory protection to achieve predictable timing and avoid the timing overheads
6. RTOS kernels assume that the whole design is proven correct and protection is unnecessary.
Some commercial RTOS kernels allow memory protection as optional and the kernel enters
a fail-safe mode when an illegal memory access occurs
7. The memory management function of an RTOS kernel is slightly different compared to the
General Purpose Operating Systems
8. A few RTOS kernels implement Virtual Memory concept for memory allocation if the sys-
tem supports secondary memory storage (like HDD and FLASH memory).
9. In the ‘block’ based memory allocation, a block of fixed memory is always allocated for
tasks on need basis and it is taken as a unit. Hence, there will not be any memory fragmen-
tation issues.
10. The memory allocation can be implemented as constant functions and thereby it consumes
fixed amount of time for memory allocation. This leaves the deterministic behavior of the
RTOS kernel untouched.
Interrupt Handling:
1. Interrupts inform the processor that an external device or an associated task requires imme-
diate attention of the CPU.
2. Interrupts can be either Synchronous or Asynchronous.
3. Interrupts which occurs in sync with the currently executing task is known as Synchronous
interrupts. Usually the software interrupts fall under the Synchronous Interrupt category.
Divide by zero, memory segmentation error etc are examples of Synchronous interrupts.
4. For synchronous interrupts, the interrupt handler runs in the same context of the interrupting
task.
5. Asynchronous interrupts are interrupts, which occurs at any point of execution of any task,
and are not in sync with the currently executing task.
6. The interrupts generated by external devices (by asserting the Interrupt line of the proces-
sor/controller to which the interrupt line of the device is connected) connected to the proces-
sor/controller, timer overflow interrupts, and serial data reception / transmission interrupts
etc are examples for asynchronous interrupts.
7. For asynchronous interrupts, the interrupt handler is usually written as separate task (De-
pends on OS Kernel implementation) and it runs in a different context. Hence, a context
switch happens while handling the asynchronous interrupts.
8. Priority levels can be assigned to the interrupts and each interrupts can be enabled or disabled
individually.
9. Most of the RTOS kernel implements ‘Nested Interrupts’ architecture. Interrupt nesting
allows the pre-emption (interruption) of an Interrupt Service Routine (ISR), servicing an
interrupt, by a higher priority interrupt.
Time Management:
1. Interrupts inform the processor that an external device or an associated task requires imme-
diate attention of the CPU.
2. Accurate time management is essential for providing precise time reference for all applica-
tions
3. The time reference to kernel is provided by a high-resolution Real Time Clock (RTC) hard-
ware chip (hardware timer)
4. The hardware timer is programmed to interrupt the processor/controller at a fixed rate. This
timer interrupt is referred as ‘Timer tick’
5. The ‘Timer tick’ is taken as the timing reference by the kernel. The ‘Timer tick’ interval may
vary depending on the hardware timer. Usually the ‘Timer tick’ varies in the microseconds
range
6. The time parameters for tasks are expressed as the multiples of the ‘Timer tick’
7. The System time is updated based on the ‘Timer tick’
8. If the System time register is 32 bits wide and the ‘Timer tick’ interval
is 1microsecond, the System time register will reset in
232 * 10-6/ (24 * 60 * 60) = 49700 Days =∼ 0.0497 Days = 1.19 Hours
If the ‘Timer tick’ interval is 1 millisecond, the System time register will reset in
232 * 10-3 / (24 * 60 * 60) = 497 Days = 49.7 Days =∼ 50 Days
The ‘Timer tick’ interrupt is handled by the ‘Timer Interrupt’ handler of kernel. The ‘Timer tick’
interrupt can be utilized for implementing the following actions.
1. Save the current context (Context of the currently executing task)
2. Increment the System time register by one. Generate timing error and reset the System time
register if the timer tick count is greater than the maximum range available for System time
register
3. Update the timers implemented in kernel (Increment or decrement the timer registers for
each timer depending on the count direction setting for each register. Increment registers
with count direction setting = ‘count up’ and decrement registers with count direction setting
= ‘count down’)
4. Activate the periodic tasks, which are in the idle state
5. Invoke the scheduler and schedule the tasks again based on the scheduling algorithm
6. Delete all the terminated tasks and their associated data structures (TCBs)
7. Load the context for the first task in the ready queue. Due to the re- scheduling, the ready
task might be changed to a new one from the task, which was pre-empted by the ‘Timer
Interrupt’ task
Hard Real-time System:
1. A Real Time Operating Systems which strictly adheres to the timing constraints for a task.
2. A Hard Real Time system must meet the deadlines for a task without any slippage
3. Missing any deadline may produce catastrophic results for Hard Real Time Systems, includ-
ing permanent data lose and irrecoverable damages to the system/users
4. Emphasize on the principle ‘A late answer is a wrong answer’
5. Air bag control systems and Anti-lock Brake Systems (ABS) of vehicles are typical exam-
ples of Hard Real Time Systems
6. As a rule of thumb, Hard Real Time Systems does not implement the virtual memory model
for handling the memory. This eliminates the delay in swapping in and out the code corre-
sponding to the task to and from the primary memory
7. The presence of Human in the loop (HITL) for tasks introduces unexpected delays in the
task execution. Most of the Hard Real Time Systems are automatic and does not contain a
‘human in the loop’
Soft Real-time System:
1. Real Time Operating Systems that does not guarantee meeting deadlines, but, offer the best
effort to meet the deadline
2. Missing deadlines for tasks are acceptable if the frequency of deadline missing is within the
compliance limit of the Quality of Service(QoS)
3. A Soft Real Time system emphasizes on the principle ‘A late answer is an acceptable an-
swer, but it could have done bit faster’
4. Soft Real Time systems most often have a ‘human in the loop (HITL)’
5. Automatic Teller Machine (ATM) is a typical example of Soft Real Time System. If the
ATM takes a few seconds more than the ideal operation time, nothing fatal happens.
6. An audio video play back system is another example of Soft Real Time system. No potential
damage arises if a sample comes late by fraction of a second, for play back.
Tasks, Processes & Threads:
1. In the Operating System context, a task is defined as the program in execution and the related
information maintained by the Operating system for the program
2. Task is also known as ‘Job’ in the operating system context
3. A program or part of it in execution is also called a ‘Process’
4. The terms ‘Task’, ‘job’ and ‘Process’ refer to the same entity in the Operating System con-
text and most often they are used interchangeably
5. A process requires various system resources like CPU for executing the process, memory
for storing the code corresponding to the process and associated variables, I/O devices for
information exchange etc
3.4 The structure of a Processes
1. The concept of ‘Process’ leads to concurrent execution (pseudo parallelism) of tasks and
thereby the efficient utilization of the CPU and other system resources
2. Concurrent execution is achieved through the sharing of CPU among the processes.
3. A process mimics a processor in properties and holds a set of registers, process status, a
Program Counter (PC) to point to the next executable instruction of the process, a stack for
holding the local variables associated with the process and the code corresponding to the
process
4. A process, which inherits all the properties of the CPU, can be considered as a virtual pro-
cessor, awaiting its turn to have its properties switched into the physical processor
5. When the process gets its turn, its registers and Program counter register becomes mapped
to the physical registers of the CPU
F IGURE 3.3: Structure of a Process
3.5 Memory organization of Processes:
The memory occupied by the process is segregated into three regions namely; Stack memory, Data
memory and Code memory. The Stack memory holds all temporary data such as variables local to
the process
Data memory holds all global data for the process
The Code memory contains the program code (instructions) corresponding to the process
F IGURE 3.4: Memory organization of a Process
Process States & State Transition
1. The creation of a process to its termination is not a single step operation
2. The process traverses through a series of states during its transition from the newly created
state to the terminated state
3. The cycle through which a process changes its state from ‘newly created’ to ‘execution
completed’ is known as ‘Process Life Cycle’. The various states through which a process
traverses through during a Process Life Cycle indicates the current status of the process with
respect to time and also provides information on what it is allowed to do next
Process States & State Transition:
1. Created State: The state at which a process is being created is referred as ‘Created State’.
The Operating System recognizes a process in the ‘Created State’ but no resources are allo-
cated to the process
2. Ready State: The state, where a process is incepted into the memory and awaiting the
processor time for execution, is known as ‘Ready State’. At this stage, the process is placed
in the ‘Ready list’ queue maintained by the OS
3. Running State: The state where in the source code instructions corresponding to the process
is being executed is called ‘Running State’. Running state is the state at which the process
execution happens
4. Blocked State/Wait State: Refers to a state where a running process is temporarily sus-
pended from execution and does not have immediate access to resources. The blocked state
might have invoked by various conditions like- the process enters a wait state for an event
to occur (E.g. Waiting for user inputs such as keyboard input) or waiting for getting access
to a shared resource like semaphore, mutex etc
5. Completed State: A state where the process completes its execution
The transition of a process from one state to another is known as ‘State transition’ When a process
changes its state from Ready to running or from running to blocked or terminated or from blocked
to running, the CPU allocation for the process may also change
F IGURE 3.5: Process states and State transition
3.6 Threads
1. A thread is the primitive that can execute code
2. A thread is a single sequential flow of control within a process
3. ‘Thread’ is also known as lightweight process
4. A process can have many threads of execution

5. Different threads, which are part of a process, share the same address space; meaning they
share the data memory, code memory and heap memory area
6. Threads maintain their own thread status (CPU register values), Program Counter (PC) and
stack
F IGURE 3.6: Memory organization of process and its associated Threads

Thread V/s Process
Thread Process
Thread is a single unit of execution and is part of Process is a program in execution and con-
process. tains one or more threads.
A thread does not have its own data memory and Process has its own code memory, data
heap memory. It shares the data memory and heap memory and stack memory.
memory with other threads of the same process.
A thread cannot live independently; it lives within A process contains at least one thread.
the process.
There can be multiple threads in a process. The Threads within a process share the code,
first thread (main thread) calls the main function data and heap memory. Each thread holds
and occupies the start of the stack memory of the separate memory area for stack (shares the
process. total stack memory of the process).
Threads are very inexpensive to create Processes are very expensive to create. In-
volves many OS overhead.
Context switching is inexpensive and fast Context switching is complex and involves
lot of OS overhead and is comparatively
slower.
If a thread expires, its stack is reclaimed by the If a process dies, the resources allocated to
process. it are reclaimed by the OS and all the asso-
ciated threads of the process also dies.
Advantages of Threads:
1. Better memory utilization: Multiple threads of the same process share the address space
for data memory. This also reduces the complexity of inter thread communication since
variables can be shared across the threads.
2. Efficient CPU utilization: The CPU is engaged all time.
3. Speeds up the execution of the process: The process is split into different threads, when
one thread enters a wait state, the CPU can be utilized by other threads of the process that
do not require the event, which the other thread is waiting, for processing.
3.7 Multiprocessing & Multitasking
1. The ability to execute multiple processes simultaneously is referred as multiprocessing
2. Systems which are capable of performing multiprocessing are known as multiprocessor

systems
3. Multiprocessor systems possess multiple CPUs and can execute multiple processes simulta-
neously
4. The ability of the Operating System to have multiple programs in memory, which are ready
for execution, is referred as multiprogramming
5. Multitasking refers to the ability of an operating system to hold multiple processes in mem-
ory and switch the processor (CPU) from executing one process to another process
6. Multitasking involves ‘Context switching’, ‘Context saving’ and ‘Context retrieval’
7. Context switching refers to the switching of execution context from task to other
8. When a task/process switching happens, the current context of execution should be saved
to (Context saving) retrieve it at a later point of time when the CPU executes the process,
which is interrupted currently due to execution switching
9. During context switching, the context of the task to be executed is retrieved from the saved
context list. This is known as Context retrieval.
Multitasking – Context Switching:
F IGURE 3.7: Multitasking Context Switching
Types of Multitasking:
Depending on how the task/process execution switching act is implemented, multitasking can is
classified into
1. Co-operative Multitasking: Co-operative multitasking is the most primitive form of mul-

titasking in which a task/process gets a chance to execute only when the currently executing
task/process voluntarily relinquishes the CPU. In this method, any task/process can avail
the CPU as much time as it wants. Since this type of implementation involves the mercy
of the tasks each other for getting the CPU time for execution, it is known as co-operative
multitasking. If the currently executing task is non-cooperative, the other tasks may have to
wait for a long time to get the CPU
2. Preemptive Multitasking: Preemptive multitasking ensures that every task/process gets a

chance to execute. When and how much time a process gets is dependent on the imple-
mentation of the preemptive scheduling. As the name indicates, in preemptive multitasking,
the currently running task/process is preempted to give a chance to other tasks/process to

execute. The preemption of task may be based on time slots or task/process priority
(a) Non-preemptive Multitasking: The process/task, which is currently given the CPU
time, is allowed to execute until it terminates (enters the ‘Completed’ state) or enters
the ‘Blocked/Wait’ state, waiting for an I/O. The co- operative and non-preemptive
multitasking differs in their behavior when they are in the ‘Blocked/Wait’ state. In
co-operative multitasking, the currently executing process/task need not relinquish the
CPU when it enters the ‘Blocked/Wait’ sate, waiting for an I/O, or a shared resource
access or an event to occur whereas in non-preemptive multitasking the currently exe-
cuting task relinquishes the CPU when it waits for an I/O.
Task Scheduling:
1. In a multitasking system, there should be some mechanism in place to share the CPU among
the different tasks and to decide which process/task is to be executed at a given point of time
2. Determining which task/process is to be executed at a given point of time is known as

task/process scheduling
3. Task scheduling forms the basis of multitasking
4. Scheduling policies forms the guidelines for determining which task is to be executed when.
5. The scheduling policies are implemented in an algorithm and it is run by the kernel as a
service
6. The kernel service/application, which implements the scheduling algorithm, is known as

‘Scheduler’
7. The task scheduling policy can be pre-emptive, non-preemptive or co- operative
8. Depending on the scheduling policy the process scheduling decision may take place when
a process switches its state to
Ready’ state from ‘Running’ state
‘Blocked/Wait’ state from ‘Running’ state
‘Ready’ state from ‘Blocked/Wait’ state
‘Completed’ state
Task Scheduling - Scheduler Selection:
The selection of a scheduling criteria/algorithm should consider the following factors:
1. CPU Utilization: The scheduling algorithm should always make the CPU utilization high.
CPU utilization is a direct measure of how much percentage of the CPU is being utilized.
2. Throughput: This gives an indication of the number of processes executed per unit of time.
The throughput for a good scheduler should always be higher.
3. Turnaround Time: It is the amount of time taken by a process for completing its execution.
It includes the time spent by the process for waiting for the main memory, time spent in the
ready queue, time spent on completing the I/O operations, and the time spent in execution.
The turnaround time should be a minimum for a good scheduling algorithm.
4. Waiting Time: It is the amount of time spent by a process in the ‘Ready’ queue waiting to
get the CPU time for execution. The waiting time should be minimal for a good scheduling
algorithm.
5. Response Time: It is the time elapsed between the submission of a process and the first
response. For a good scheduling algorithm, the response time should be as least as possible.
Task Scheduling - Queues
The various queues maintained by OS in association with CPU scheduling are:
1. Job Queue: Job queue contains all the processes in the system
2. Ready Queue: Contains all the processes, which are ready for execution and waiting for
CPU to get their turn for execution. The Ready queue is empty when there is no process
ready for running.
3. Device Queue: Contains the set of processes, which are waiting for an I/O device
Task Scheduling – Task transition through various Queues
F IGURE 3.8: Non-preemptive scheduling – First Come First Served (FCFS)/First In First Out
(FIFO) Scheduling
Non-preemptive scheduling – First Come First Served (FCFS)/First In First Out (FIFO)
Scheduling
1. Allocates CPU time to the processes based on the order in which they enters the ‘Ready’
queue
2. The first entered process is serviced first
3. It is same as any real world application where queue systems are used; E.g. Ticketing
Drawbacks:
1. Favors monopoly of process. A process, which does not contain any I/O operation, contin-
ues its execution until it finishes its task
2. In general, FCFS favors CPU bound processes and I/O bound processes may have to wait
until the completion of CPU bound process, if the currently executing process is a CPU
bound process. This leads to poor device utilization.
3. The average waiting time is not minimal for FCFS scheduling algorithm
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time 10, 5,
7 milliseconds respectively enters the ready queue together in the order P1, P2, P3. Calculate the
waiting time and Turn around Time (TAT) for each process and the Average waiting time and Turn
Around Time (Assuming there is no I/O waiting for the processes).
Solution: The sequence of execution of the processes by the CPU is represented as
Assuming the CPU is readily available at the time of arrival of P1, P1 starts executing without any
waiting in the ‘Ready’ queue. Hence the waiting time for P1 is zero.
Waiting Time for P1 = 0 ms (P1 starts executing first)
Waiting Time for P2 = 10 ms (P2 starts executing after completing P1)
Waiting Time for P3 = 15 ms (P3 starts executing after completing P1 and P2)
Average waiting time = (Waiting time for all processes) / No. of Processes
= (Waiting time for (P1+P2+P3)) / 3
= (0+10+15)/3 = 25/3 = 8.33 milliseconds
Turn Around Time (TAT) for P1 = 10 ms (Time spent in Ready Queue +
Execution Time)
Turn Around Time (TAT) for P2 = 15 ms (-Do-) Turn Around Time (TAT) for P3 = 22 ms (-Do-)
Average Turn around Time= (Turn around Time for all processes) / No. of Processes
= (Turn Around Time for (P1+P2+P3)) / 3
= (10+15+22)/3 = 47/3
= 15.66 milliseconds
Non-preemptive scheduling – Last Come First Served (LCFS)/Last In First Out (LIFO)
Scheduling:
1. Allocates CPU time to the processes based on the order in which they are entered in the
‘Ready’ queue
2. The last entered process is serviced first
3. LCFS scheduling is also known as Last In First Out (LIFO) where the process, which is put
last into the ‘Ready’ queue, is serviced first
Drawbacks:
1. Favors monopoly of process. A process, which does not contain any I/O operation, contin-
ues its execution until it finishes its task
2. In general, LCFS favors CPU bound processes and I/O bound processes may have to wait
until the completion of CPU bound process, if the currently executing process is a CPU
bound process. This leads to poor device utilization.
3. The average waiting time is not minimal for LCFS scheduling algorithm
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time 10, 5, 7
milliseconds respectively enters the ready queue together in the order P1, P2, P3 (Assume only P1
is present in the ‘Ready’ queue when the scheduler picks up it and P2, P3 entered ‘Ready’ queue
after that). Now a new process P4 with estimated completion time 6ms enters the ‘Ready’ queue
after 5ms of scheduling P1. Calculate the waiting time and Turn around Time (TAT) for each
process and the Average waiting time and Turn around Time (Assuming there is no I/O waiting
for the processes).Assume all the processes contain only CPU operation and no I/O operations are
involved.
Solution: Initially there is only P1 available in the Ready queue and the scheduling sequence will
be P1, P3, P2. P4 enters the queue during the execution of P1 and becomes the last process entered
the ‘Ready’ queue. Now the order of execution changes to P1, P4, P3, and P2 as given below.
The waiting time for all the processes are given as Waiting Time for P1 = 0 ms (P1 starts executing
first)
Waiting Time for P4 = 5 ms (P4 starts executing after completing P1. But P4 arrived after 5ms of
execution of P1. Hence its waiting time = Execution start time
1. Arrival Time = 10-5 = 5)
Waiting Time for P2 = 23 ms (P2 starts executing after completing P1, P4 and P3)
= (Waiting time for (P1+P4+P3+P2)) / 4
= (0 + 5 + 16 + 23)/4 = 44/4
= 11 milliseconds
Turn around Time (TAT) for P1 = 10 ms (Time spent in Ready Queue + Execution Time)
Turn around Time (TAT) for P4 = 11 ms (Time spent in Ready Queue +
Execution Time = (Execution Start Time – Arrival Time) + Estimated Execution Time = (10-5) +
6 = 5 + 6)
Turn Around Time (TAT) for P2 = 28 ms (Time spent in Ready Queue + Execution Time)
Average Turn Around Time = (Turn Around Time for all processes) / No. of Processes
= (Turn Around Time for (P1+P4+P3+P2)) / 4

= (10+11+23+28)/4 = 72/4
= 18 milliseconds
Non-preemptive scheduling – Shortest Job First (SJF) Scheduling.
1. (a) Allocates CPU time to the processes based on the execution completion time for tasks
(b) The average waiting time for a given set of processes is minimal in SJF scheduling
(c) Optimal compared to other non-preemptive scheduling like FCFS
Drawbacks:
1. A process whose estimated execution completion time is high may not get a chance to
execute if more and more processes with least estimated execution time enters the ‘Ready’
queue before the process with longest estimated execution time starts its execution
2. May lead to the ‘Starvation’ of processes with high estimated completion time
3. Difficult to know in advance the next shortest process in the ‘Ready’ queue for scheduling
since new processes with different estimated execution time keep entering the ‘Ready’ queue
at any point of time.
Non-preemptive scheduling – Priority based Scheduling
1. (a) A priority, which is unique or same is associated with each task
(b) The priority of a task is expressed in different ways, like a priority number, the time
required to complete the execution etc.
(c) In number based priority assignment the priority is a number ranging from 0 to the
maximum priority supported by the OS. The maximum level of priority is OS depen-
dent.
(d) Windows CE supports 256 levels of priority (0 to 255 priority numbers, with 0 being
the highest priority)
1. (a) The priority is assigned to the task on creating it. It can also be changed dynamically
(If the Operating System supports this feature)
(b) The non-preemptive priority based scheduler sorts the ‘Ready’ queue based on the
priority and picks the process with the highest level of priority for execution.
7 milliseconds and priorities 0, 3, 2 (0- highest priority, 3 lowest priority) respectively enters the
ready queue together. Calculate the waiting time and Turn Around Time (TAT) for each process
and the Average waiting time and Turn Around Time (Assuming there is no I/O waiting for the
processes) in priority based scheduling algorithm.
Solution: The scheduler sorts the ‘Ready’ queue based on the priority and schedules the process
with the highest priority (P1 with priority number 0) first and the next high priority process (P3
with priority number 2) as second and so on. The order in which the processes are scheduled for
execution is represented as
The waiting time for all the processes are given as Waiting Time for P1 = 0 ms (P1 starts executing
first)
Waiting Time for P3 = 10 ms (P3 starts executing after completing P1)
= (0+10+17)/3 = 27/3
= 9 milliseconds
Turn Around Time (TAT) for P3 = 17 ms (-Do-) Turn Around Time (TAT) for P2 = 22 ms (-Do-)
Average Turn Around Time= (Turn Around Time for all processes) / No. of Processes
= (10+17+22)/3 = 49/3
= 16.33 milliseconds Drawbacks:
1. Similar to SJF scheduling algorithm, non-preemptive priority based algorithm also possess
the drawback of ‘Starvation’ where a process whose priority is low may not get a chance
to execute if more and more processes with higher priorities enter the ‘Ready’ queue before
the process with lower priority starts its execution.
2. ‘Starvation’ can be effectively tackled in priority based non-preemptive scheduling by dy-

namically raising the priority of the low priority task/process which is under starvation (wait-
ing in the ready queue for a longer time for getting the CPU time)
3. The technique of gradually raising the priority of processes which are waiting in the ‘Ready’
queue as time progresses, for preventing ‘Starvation’, is known as ‘Aging’.
Preemptive scheduling:
1. (a) Employed in systems, which implements preemptive multitasking model
(b) Every task in the ‘Ready’ queue gets a chance to execute. When and how often each
process gets a chance to execute (gets the CPU time) is dependent on the type of
preemptive scheduling algorithm used for scheduling the processes
(c) The scheduler can preempt (stop temporarily) the currently executing task/process and
select another task from the ‘Ready’ queue for execution
(d) When to pre-empt a task and which task is to be picked up from the ‘Ready’ queue
for execution after preempting the current task is purely dependent on the scheduling
algorithm
(e) A task which is preempted by the scheduler is moved to the ‘Ready’ queue. The act of
moving a ‘Running’ process/task into the ‘Ready’ queue by the scheduler, without the
processes requesting for it is known as‘Preemption’
(f) Time-based preemption and priority-based preemption are the two important approaches
adopted in preemptive scheduling
Preemptive scheduling – Preemptive SJF Scheduling/ Shortest Remaining Time (SRT):
1. The non preemptive SJF scheduling algorithm sorts the ‘Ready’ queue only after the current
process completes execution or enters wait state, whereas the preemptive SJF scheduling
algorithm sorts the ‘Ready’ queue when a new process enters the ‘Ready’ queue and checks
whether the execution time of the new process is shorter than the remaining of the total
estimated execution time of the currently executing process
2. If the execution time of the new process is less, the currently executing process is preempted
and the new process is scheduled for execution
3. Always compares the execution completion time (ie the remaining execution time for the
new process) of a new process entered the ‘Ready’ queue with the remaining time for com-
pletion of the currently executing process and schedules the process with shortest remaining
time for execution.
EXAMPLE: Three processes with process IDs P1, P2, P3 with estimated completion time 10,
5, 7 milliseconds respectively enters the ready queue together. A new process P4 with estimated
completion time 2ms enters the ‘Ready’ queue after 2ms. Assume all the processes contain only
CPU operation and no I/O operations are involved.
Solution: At the beginning, there are only three processes (P1, P2 and P3) available in the ‘Ready’
queue and the SRT scheduler picks up the process with the Shortest remaining time for execution
completion (In this example P2 with remaining time 5ms) for scheduling. Now process P4 with
estimated execution completion time 2ms enters the ‘Ready’ queue after 2ms of start of execution
of P2. The processes are re-scheduled for execution in the following order
The waiting time for all the processes are given as
Waiting Time for P2 = 0 ms + (4 -2) ms = 2ms (P2 starts executing first and is
interrupted by P4 and has to wait till the completion of P4 to get the next CPU slot)
Waiting Time for P4 = 0 ms (P4 starts executing by preempting P2 since the
execution time for completion of P4 (2ms) is less than that of the Remaining time for execution
completion of P2 (Here it is 3ms))
Average waiting time = (Waiting time for all the processes) / No. of Processes
= (0 + 2 + 7 + 14)/4 = 23/4
= 5.75 milliseconds
Turn Around Time (TAT) for P4 = 2 ms (Time spent in Ready Queue + Execution Time
= (Execution Start Time – Arrival Time) + Estimated Execution Time = (2-2) + 2)
Turn around Time (TAT) for P1 = 24 ms (Time spent in Ready Queue +Execution Time)
Average Turn around Time = (Turn around Time for all the processes) / No. of Processes
= (7+2+14+24)/4 = 47/4
Preemptive scheduling – Round Robin (RR) Scheduling:

The term Round Robin is very popular among the sports and games activities. You might have
heard about ’Round Robin’ league or ’Knock out’ league associated with any football or cricket
tournament. In the ’Round Robin’ league each team in a group gets an equal chance to play
against the rest of the teams in the same group whereas in the ’Knock out’ league the losing team
in a match moves out of the tournament .
In Round Robin scheduling, each process in the ’Ready’ queue is executed for a pre-defined time
slot.
The execution starts with picking up the first process in the ’Ready’ queue. It is executed for
a pre-defined time and when the pre-defined time elapses or the process completes (before the
pre-defined time slice), the next process in the ’Ready’ queue is selected for execution.
This is repeated for all the processes in the ’Ready’ queue. Once each process in the ’Ready’ queue
is executed for the pre-defined time period, the scheduler comes back and picks the first process in
the ’Ready’ queue again for execution.
The sequence is repeated. This reveals that the Round Robin scheduling is similar to the FCFS
scheduling and the only difference is that a time slice based preemption is added to switch the
execution between the processes in the ‘Ready’ queue.
F IGURE 3.9: Round Robin Scheduling
Figure: Round Robin Scheduling
1. This is repeated for all the processes in the ‘Ready’ queue

2. Once each process in the ‘Ready’ queue is executed for the pre-defined time period, the
scheduler comes back and picks the first process in the ‘Ready’ queue again for execution.
3. Round Robin scheduling is similar to the FCFS scheduling and the only difference is that a
time slice based preemption is added to switch the execution between the processes in the
‘Ready’ queue
2 milliseconds respectively, enters the ready queue together in the order P1, P2, P3. Calculate the
waiting time and Turn Around Time (TAT) for each process and the Average waiting time and
Turn Around Time (Assuming there is no I/O waiting for the processes) in RR algorithm with
Time slice= 2ms.
Solution: The scheduler sorts the ‘Ready’ queue based on the FCFS policy and picks up the
first process P1 from the ‘Ready’ queue and executes it for the time slice 2ms. When the time
slice is expired, P1 is preempted and P2 is scheduled for execution. The Time slice expires after
2ms of execution of P2. Now P2 is preempted and P3 is picked up for execution. P3 completes
its execution within the time slice and the scheduler picks P1 again for execution for the next
time slice. This procedure is repeated till all the processes are serviced. The order in which the
processes are scheduled for execution is represented as
Waiting Time for P1 = 0 + (6-2) + (10-8) = 0+4+2= 6ms (P1 starts executing first
and waits for two time slices to get execution back and again 1 time slice for getting CPU time)
Waiting Time for P2 = (2-0) + (8-4) = 2+4 = 6ms (P2 starts executing after P1
executes for 1 time slice and waits for two time slices to get the CPU time)
Waiting Time for P3 = (4 -0) = 4ms (P3 starts executing after completing the first time slices for
P1 and P2 and completes its execution in a single time slice.)
= (6+6+4)/3 = 16/3
= 5.33 milliseconds
Turn Around Time (TAT) for P2 = 10 ms (-Do-)
Turn Around Time (TAT) for P3 = 6 ms (-Do-)
Average Turn around Time = (Turn around Time for all the processes) / No. of Processes
= (12+10+6)/3 = 28/3
= 9.33 milliseconds.
Preemptive scheduling – Priority based Scheduling
1. Same as that of the non-preemptive priority based scheduling except for the switching of
execution between tasks
2. In preemptive priority based scheduling, any high priority process entering the ‘Ready’
queue is immediately scheduled for execution whereas in the non-preemptive scheduling
any high priority process entering the ‘Ready’ queue is scheduled only after the currently
executing process completes its execution or only when it voluntarily releases the CPU
3. The priority of a task/process in preemptive priority based scheduling is indicated in the

same way as that of the mechanisms adopted for non- preemptive multitasking.
7 milliseconds and priorities 1, 3, 2 (0- highest priority, 3 lowest priority) respectively enters the
ready queue together. A new process P4 with estimated completion time 6ms and priority 0 enters
the ‘Ready’ queue after 5ms of start of execution of P1. Assume all the processes contain only
CPU operation and no I/O operations are involved.
Solution: At the beginning, there are only three processes (P1, P2 and P3) available in the ‘Ready’
queue and the scheduler picks up the process with the highest priority (In this example P1 with
priority 1) for scheduling. Now process P4 with estimated execution completion time 6ms and
priority 0 enters the ‘Ready’ queue after 5ms of start of execution of P1. The processes are re-
scheduled for execution in the following order
Waiting Time for P1 = 0 + (11-5) = 0+6 =6 ms (P1 starts executing first and gets
Preempted by P4 after 5ms and again gets the CPU time after completion of P4)
Waiting Time for P4 = 0 ms (P4 starts executing immediately on entering the
‘Ready’ queue, by preempting P1)
= (6 + 0 + 16 + 23)/4 = 45/4
Turn Around Time (TAT) for P4 = 6ms (Time spent in Ready Queue + Execution Time
= (Execution Start Time – Arrival Time) + Estimated Execution Time = (5-5) + 6 = 0 + 6)

Turn Around Time (TAT) for P3 = 23 ms (Time spent in Ready Queue + Execution Time) Turn
Around Time (TAT) for P2 = 28 ms (Time spent in Ready Queue + Execution Time) Average Turn
Around Time= (Turn Around Time for all the processes) / No. of Processes
= (16+6+23+28)/4 = 73/4
How to choose RTOS: The decision of an RTOS for an embedded design is very critical.A lot of
factors need to be analyzed carefully before making a decision on the selection of an RTOS.These
factors can be either
1. Functional
2. Non-functional requirements.
1. Functional Requirements:
Processor support:
1. It is not necessary that all RTOS’s support all kinds of processor architectures.
2. It is essential to ensure the processor support by the RTOS
Memory Requirements:
1. The RTOS requires ROM memory for holding the OS files and it is normally stored in a
non-volatile memory like FLASH.
2. OS also requires working memory RAM for loading the OS service.
3. Since embedded systems are memory constrained, it is essential to evaluate the minimal
RAM and ROM requirements for the OS under consideration.
Real-Time Capabilities:
1. It is not mandatory that the OS for all embedded systems need to be Real- Time and all
embedded OS’s are ‘Real-Time’ in behavior.
2. The Task/process scheduling policies plays an important role in the Real- Time behavior of
an OS.
Kernel and Interrupt Latency:
1. The kernel of the OS may disable interrupts while executing certain services and it may lead
to interrupt latency.
2. For an embedded system whose response requirements are high, this latency should be min-
imal.
Inter process Communication (IPC) and Task Synchronization:
1. The implementation of IPC and Synchronization is OS kernel dependent.
Modularization Support:
1. Most of the OS’s provide a bunch of features.
2. It is very useful if the OS supports modularization where in which the developer can choose
the essential modules and re-compile the OS image for functioning.
Support for Networking and Communication:
1. The OS kernel may provide stack implementation and driver support for a bunch of com-
munication interfaces and networking.
2. Ensure that the OS under consideration provides support for all the interfaces required by
the embedded product.
Development Language Support:
1. Certain OS’s include the run time libraries required for running applications written in
languages like JAVA and C++.
2. The OS may include these components as built-in component, if not; check the availability
of the same from a third party.
2. Non-Functional Requirements:
Custom Developed or Off the Shelf:
1. It is possible to go for the complete development of an OS suiting the embedded system

needs or use an off the shelf, readily available OS.
2. It may be possible to build the required features by customizing an open source OS.
3. The decision on which to select is purely dependent on the development cost, licensing fees
for the OS, development time and availability of skilled resources.
4. Cost
5. The total cost for developing or buying the OS and maintaining it in terms of commercial
product and custom build needs to be evaluated before taking a decision on the selection of
OS.
Development and Debugging tools Availability:
1. The availability of development and debugging tools is a critical decision making factor in
the selection of an OS for embedded design.
2. Certain OS’s may be superior in performance, but the availability of tools for supporting the
development may be limited.
Ease of Use:
1. How easy it is to use a commercial RTOS is another important feature that needs to be
considered in the RTOS selection.
After Sales:
1. For a commercial embedded RTOS, after sales in the form of e-mail, on-call services etc.
for bug fixes, critical patch updates and support for production issues etc. should be analyzed
thoroughly.
3.8 TASK COMMUNICATION:
In a multitasking system, multiple tasks/processes run concurrently (in pseudo parallelism) and
each process may or may not interact between. Based on the degree of interaction, the processes
running on an OS are classified as,
1. Co-operating Processes: In the co-operating interaction model one process requires the inputs
from other processes to complete its execution.
2. Competing Processes: The competing processes do not share anything among themselves but
they share the system resources. The competing processes compete for the system resources such
as file, display device, etc.
Co-operating processes exchanges information and communicate through the following methods.
Co-operation through Sharing: The co-operating process exchange data through some shared
resources.
Co-operation through Communication: No data is shared between the processes. But they
communicate for synchronization.
The mechanism through which processes/tasks communicate each other is known as “Inter Pro-
cess/Task Communication (IPC)”. Inter Process Communication is essential for process co-
ordination. The various types of Inter Process Communication (IPC) mechanisms adopted by
process are kernel (Operating System) dependent. Some of the important IPC mechanisms adopted
by various kernels are explained below.
Shared Memory:
Processes share some area of the memory to communicate among them. Information to be com-
municated by the process is written to the shared memory area. Other processes which require this
information can read the same from the shared memory area. It is same as the real world example
where ’Notice Board’ is used by corporate to publish the public information among the employees
(The only exception is; only corporate have the right to modify the-information published on the
Notice board and employees are given ’Read’ only access, meaning it is only a one way channel).
F IGURE 3.10: Concept of shared memory
The implementation of shared memory concept is kernel dependent. Different mechanisms are
adopted by different kernels for implementing this. A few among them are:
3.9 Pipes:
’Pipe’ is a section of the shared memory used by processes for communicating. Pipes follow the
client-server architecture. A process which creates a pipe is known as a pipe server and a process
which connects to a pipe is known as pipe client. A pipe can be considered as a conduit for
information flow and has two conceptual ends. It can be unidirectional, allowing information flow
in one direction or bidirectional allowing bi-directional information flow. A unidirectional pipe
allows the process connecting at one end of the pipe to write to the pipe and the process connected
at the other end of the pipe to read the data, whereas a bi-directional pipe allows both reading and
writing at one end. The unidirectional pipe can be visualized as
F IGURE 3.11: unidirectional pipe
The implementation of ‘pipes’ is also OS dependent. Microsoft® Windows Desktop Operating

Systems support two types of ’Pipes’ for Inter Process Communication. They are:
Anonymous Pipes: The anonymous pipes-are unnamed, unidirectional pipes used for data transfer
between two processes.
Named Pipes: Named pipe is a named, unidirectional or bi-directional pipe for data exchange
between processes. Like anonymous pipes, the process which creates the named pipe is known as
pipe server. A process which connects to the named pipe is known as pipe client.
With named pipes, any process can act as both client and server allowing point-to-point commu-
nication. Named pipes can be used for communicating between processes running on the same
machine or between processes running on different machines connected to a network.
Please refer to the Online Learning Centre for details on the Pipe implementation under Windows
Operating Systems.
Under VxWorks kernel, pipe is a special implementation of message queues. We will discuss the
same in a latter chapter.
Memory Mapped Objects:
Memory mapped object is a shared memory technique adopted by certain Real-Time Operating
Systems for allocating a shared block of memory which can be accessed by multiple process
simultaneously (of course certain synchronization techniques should be applied to prevent incon-
sistent results). In this approach a mapping object is created and physical storage for it is reserved
and committed. A process can map the entire committed physical area or a block of it to its virtual
address space. All read and write operation to this virtual address space by a process is directed to
its committed physical area. Any process which wants to share data with other processes can map
the physical memory area of the mapped object to its virtual memory space and use it for sharing
the data.
Message Passing:
Message passing-is an (a) synchronous information exchange mechanism used for Inter Pro-
cess/Thread Communication. The major difference between shared memory and message passing
technique is that, through shared memory lots of data can be shared whereas only limited amount
of info/data is passed through message passing. Also message passing is relatively fast and free
from the synchronization overheads compared to shared memory. Based on the message passing
operation between the processes, message passing is classified into:
1. Message Queue.
2. Mailbox.
3. Signaling.
Message Queue: Usually the process which wants to talk to another process posts the message to
a First-In-First-Out (FIFO) queue called ’Message queue’, which stores the messages temporarily
in a system defined memory object, to pass it to the desired process (Fig. 10.20). Messages are
sent and received through send (Name of the process to which the message is to be sent,-message)
and receive (Name of the process from which the message is to be received, message) methods.
The messages are exchanged through a message queue. The implementation of the message queue,
send and receive methods are OS kernel dependent. The Windows XP OS kernel maintains a single
system message queue and one process/thread (Process and threads are used interchangeably here,
since thread is the basic unit of process in windows) specific message queue. A thread which
wants to communicate with another thread posts the message to the system message queue. The
kernel picks up the message from the system message queue one at a time and examines the
message for finding the destination thread and then posts the message to the message queue of
the corresponding thread. For posting a message to a thread’s message queue, the kernel fills a
message structure MSG and copies it to the message queue of the thread. The message structure
MSG contains the handle of the process/thread for which the message is intended, the message
parameters, the time at which the message is posted, etc. A thread can simply post a message to
another thread and can continue its operation or it may wait for a response from the thread to which
the message is posted. The messaging mechanism is classified into synchronous and asynchronous
based on the behaviour of the message posting thread. In asynchronous messaging, the message
posting thread just posts the message to the queue and it will not wait for an acceptance (return)
from the thread to which the message is posted, whereas in synchronous messaging, the thread
which posts a message enters waiting state and waits for the message result from the thread to
which the message is posted. The thread which invoked the send message becomes blocked and
the scheduler will not pick it up for scheduling. The PostMessage (HWND hWnd, UINT Msg,
WPARAM wParam, LPARAM /Param) or PostThreadMessage (DWORD idThread, UNT Msg,
WPARAM wParam, LPARAM IParam) API is used by a thread in Windows for posting a message
to its own message queue or to the message queue of another thread.
F IGURE 3.12: message queue based indirect messaging for IPC
The PostMessage API does not always guarantee the posting of messages to message queue. The
PostMessage API will not post a message to the message queue when the message queue is full.
Hence it is recommended to check the return value of PostMessage API to confirm the posting of
message. The SendMessage (HWND hWnd, U1NT Msg, WPARAM wParam, LPARAM 1Param)
API call sends a message to the thread specified by the handle hWnd and waits for the callee thread
to process the message. The thread which calls the SendMessage API enters waiting state and waits
for the message result from the thread to which the message is posted. The thread which invoked
the SendMessage API call becomes blocked and the scheduler will not pick it up for scheduling.
The Windows CE operating system supports a special Point-to-Point Message queue implemen-
tation. The OS maintains a First In First Out (FIFO) buffer for storing the messages and each
process can access this buffer for reading and writing messages. The OS also maintains a special
queue, with single message storing capacity, for storing high priority messages Werlmessages).
Mailbox:
Mailbox is an alternate form of ’Message queues’ and it is used in certain Real-Time Operating
Systems for IPC. Mailbox technique for IPC in RTOS is usually used for one way messaging. The
task/thread which wants to send a message to other tasks/threads creates a mailbox for posting the
messages. The threads which are interested in receiving the messages posted to the mailbox by the
mailbox creator thread can subscribe to the mailbox.
The thread which creates the mailbox is known. as ’mailbox server’ and the threads which sub-
scribe to the mailbox are known as ’mailbox clients’. The mailbox server posts messages to the
mailbox and notifies it to the clients which are subscribed to the mailbox. The clients read the
message from the mailbox on receiving the notification.
F IGURE 3.13: mailbox based indirect messaging for IPC
The mailbox creation, subscription, message reading and writing are achieved through OS kernel
provided API calls. Mailbox and message queues are same in functionality. The only difference is
in the number of messages supported by them. Both of them are used for passing data in the form
of message(s) from a task to another task(s).
Mailbox is used for exchanging a single, message between two tasks or between an Interrupt
Service Routine (ISR) and a task. Mailbox associates a pointer pointing to the mailbox and a
wait list to hold the tasks waiting for a message to appear in the mailbox. The implementation
of mailbox is OS kernel dependent. The MicroC/OS-II implements mailbox as a mechanism for
inter-task communication.
Signaling:
Signaling is a primitive way of communication between process-es/threads. Signals are used for
asynchronous notifications where one process/thread fires a signal, indicating the occurrence of
a scenario which the other process(es)/thread(s) is waiting. Signals are not queued and they do
not carry any data. The communication mechanisms used in RTX51 Tiny OS is an example for
Signaling. The amend signal kernel call under RTX 51 sends a signal from one task to a specified
task. Similarly the os wait kernel call waits for a specified signal. The VxWorks RTOS kernel also
implements ’signals’ for inter process communication. Whenever a signal occurs it is handled in a
signal handler associated with the signal. Remote Procedure Call (RPC) and Sockets:
Remote Procedure Call or RPC is the Inter Process Communication (IPC) mechanism used by a
process to call a procedure of another process running on the same CPU or on a different CPU
which is interconnected in a network. In the object oriented language terminology RPC is also
known as Remote Invocation or Remote Method Invocation (RMI). RPC is mainly used for dis-
tributed applications like client-server applications. With RPC it is possible to communicate over
a heterogeneous network (i.e. Network where Client and server applications are running on dif-
ferent Operating systems). The CPU/process containing the procedure which needs to be invoked
remotely is known as server. The CPU/process which initiates an RPC request is known as client.
F IGURE 3.14: Concept of Remote Procedure Call (RPC) for IPC
F IGURE 3.15: Concept of Remote Procedure Call (RPC) for IPC
It is possible to implement RPC communication with different invocation interfaces. In order

to make the RPC communication compatible across all platforms, it should stick on to certain
standard formats. Interface Definition Language (IDL) defines the interfaces for RPC.
Microsoft Interface Definition Language (MIDL) is the IDL implementation from Microsoft for
all Microsoft platforms. The RPC communication can be either Synchronous (Blocking) or Asyn-
chronous (Non-blocking). In the Synchronous communication, the process which calls the remote
procedure is blocked until it receives a response back from the other process. In asynchronous
RPC calls, the calling process continues its execution while the remote process performs the exe-
cution of the procedure. The result from the remote procedure is returned back to the caller through
mechanisms like callback functions.
On security front, RPC employs authentication mechanisms to protect the systems against vulner-
abilities. The client applications (processes)-should authenticate themselves with the server for
getting access. Authentication mechanisms like IDs, public-key cryptography, etc. are used by the
client for authentication. Without authentication, any client can access the remote procedure. This
may lead to potential security risks.
Sockets are used for RPC communication. The socket is a logical endpoint in a two-way commu-
nication link between two applications running on a network. A port number is associated with a
socket so that the network layer of the communication channel can deliver the data to the desig-
nated application. Sockets are of different types, namely, Internet sockets (INET), UNIX sockets,
etc. The INET socket works on internet communication protocol TCP/IP, UDP (User Datagram
Protocol), etc. are the communication protocols used by INET sockets. INET sockets are clas-
sified into: 1. Stream sockets 2. Datagram sockets Stream sockets are connection-oriented and
they use TCP to establish liable connection. On the other hand, Datagram sockets rely on UDP
for establishing a connection. The UDP connection is unreliable when compared to TCP. The
client-server communication model uses a socket at the client-side and a socket at the server-side.
A port number is assigned to both of these sockets. The client and server should be aware of the
port number associated with the socket. In order to start the communication, the client needs to
send a connection request to the server at the specified port number.
The client should be aware of the name of the server along with its port number. The server
always listens to the specified port number on the network. Upon receiving a connection request
from the client, based on the success of authentication, the server grants the connection request
and a communication channel is established between the client and server. The client uses the
hostname and port number of the server for sending requests and the server uses the client’s name
and port number for sending responses.
3.10 TASK SYNCHRONISATION:
In a multitasking environment, multiple processes run concurrently (in pseudo parallelism) and
share the system resources. Apart from this, each process has its own boundary wall and they
communicate with each other with different IPC mechanisms including shared memory and vari-
ables. Imagine a situation where two processes try to access display hardware connected to the
system or two processes try to access a shared memory area where one process tries to write to a
memory location when the other process is trying to read from this.
What could be the result in these scenarios? Obviously unexpected results. How these issues can
be addressed? The solution is, make each process aware of the access of a shared resource either
directly or indirectly. The act of making processes aware of the access of shared resources by each
process to avoid conflicts is known as ‘Task/Process Synchronization’. Various synchronization
issues may arise in a multitasking environment if processes are not synchronized properly.
The following sections describe the major task communication/ synchronization issues observed
in multitasking and the commonly adopted synchronization techniques to overcome these issues.
Task Communication/Synchronization Issues:
Racing: Let us have a look at the following piece of code.

From a programmer perspective, the value of the counter will be 10 at the end of the execution
of processes A & B. But ’it need not be always’ in a real-world execution of this piece of code
under a multitasking kernel. The results depend on the process scheduling policies adopted by
the OS kernel. The program statement counter++; looks like a single statement from a high-
level programming language (‘C’ language) perspective. The low-level implementation of this
statement is dependent on the underlying processor instruction set and the (cross) compiler in use.
The low-level implementation of the high-level program statement counter++; under Windows XP
operating system running on an Intel Centrino Duo processor is given below.
mov eax, dword ptr [ebp-4] ; Load counter in Accumulator
add eax,1 ; Increment Accumulator by 1
mov dword ptr [ebp-4], eax ; Store counter with Accumulator
At the processor instruction level, the value of the variable counter is loaded to the Accumulator
register (EAX register). The memory variable counter is represented using a pointer. The base
pointer register (EBP register) is used for pointing to the memory variable counter. After loading
the contents of the variable-counter to the Accumulator, the Accumulator content is incremented
by one using the add instruction. Finally the content of Accumulator is loaded to the memory
location which represents the variable counter. Both the processes Process A and Process B contain
the program statement counter++; Translating this into the machine instruction.
Imagine a situation where a process switching (context switching) happens from Process A to
Process B when Process A is executing the counter++; statement. Process A accomplishes the
counter++; statement through three different low-level instructions. Now imagine that the pro-
cess switching happened at the point where Process A executed the low-level instruction, ‘mov
eax,dword ptr [ebp-4]’ and is about to execute the next instruction ’add eax,1’.
F IGURE 3.16: Race condition

Though the variable counter is incremented by Process B, Process A is unaware of it and it in-
crements the variable with the old value. This leads to the loss of one increment for the Variable
counter. This problem occurs due to non-atomic Operation on variables. This issue wouldn’t have
been occurred if the underlying actions corresponding to the program statement counter++; is fin-
ished in a single CPU execution cycle. The best way to avoid this situation is make the access and
modification of shared variables mutually exclusive; meaning when one process accesses a shared
variable, prevent the other processes from accessing it. To summarize, Racing or Race condition
is the situation in which multiple processes compete (race) each other to access and manipulate
shared data concurrently. In a Race condition, the final value of the shared data depends on the
process which acted on the data finally.
3.11 Deadlock
A race condition produces incorrect results whereas a deadlock condition creates a situation where
none of the processes are able to make any progress in their execution, resulting in a get of dead-
locked processes. A situation very similar to our traffic jam issues in a junction.
F IGURE 3.17: Deadlock Visualization
In its simplest form ’deadlock’ is the condition in which a process is waiting for a resource held by
another process which is waiting for a resource held by the first process. To elaborate: Process A
holds a resource x and it wants a resource y held by Process B. Process B is currently holding re-
source y and it wants the resource x which is currently held by Process A. Both hold the respective
resources and they compete each other to get the resource held by the respective processes. The
result of the competition is ’deadlock’. None of the competing processes will be able to access the
resources held by other processes since they are locked by the respective processes.
F IGURE 3.18: Deadlock leading
The different conditions favoring a deadlock situation are listed below.
Mutual Exclusion: The criteria that only one process can hold resource at a time. Meaning
processes should access shared resources with mutual exclusion. Typical example is the accessing
of display hardware in an embedded device.
Hold and Walt: The condition in which a process holds a shared resource by acquiring the lock
controlling the shared access and waiting for additional resources held by other processes.
No Resource Preemption: The criteria that operating system cannot take back a resource from
a process which is currently holding it and the resource can only be released voluntarily by the
process holding it.
Circular Wait: A process is waiting for a resource which is currently held by another process
which in turn is waiting for a resource held by the first process. In general, there exists a set of
waiting process P0, P1, Pn with P0 is waiting for a resource held by P1 and P1 is waiting for a
resource held P0, Pn is waiting for a resource held by P0 and P0 is waiting for a resource held by
Pn and so on... This forms a circular wait queue.
Deadlock Handling: A smart OS may foresee the deadlock condition and will act proactively to
avoid such a situation. Now if a deadlock occurred, how the OS responds to it? The reaction to
deadlock condition by OS is nonuniform. The OS may adopt any of the following techniques to
detect and prevent deadlock conditions.
(i).Ignore Deadlocks: Always assume that the system design is deadlock free. This is acceptable
for the reason the cost of removing a deadlock is large compared to the chance of happening a
deadlock. UNIX is an example for an OS following this principle. A life critical system cannot
pretend that it is deadlock free for any reason.
(ii). Detect and Recover: This approach suggests the detection of a deadlock situation and recov-
ery from it. This is similar to the deadlock condition that may arise at a traffic junction.
When the vehicles from different directions compete to cross the junction, deadlock (traffic jam)
condition is resulted. Once a deadlock (traffic jam) is happened at the junction, the only solution is
to back up the vehicles from one direction and allow the vehicles from opposite direction to cross
the junction. If the traffic is too high, lots of vehicles may have to be backed up to resolve the
traffic jam. This technique is also known as ‘back up cars’ technique.
Operating systems keep a resource graph in their memory. The resource graph is updated on each
resource request and release.
Avoid Deadlocks: Deadlock is avoided by the careful resource allocation techniques by the Oper-
ating System. It is similar to the traffic light mechanism at junctions to avoid the traffic jams.
Prevent Deadlocks: Prevent the deadlock condition by negating one of the four conditions favor-
ing the deadlock situation.
Ensure that a process does not hold any other resources when it requests a resource. This can
be achieved by implementing the following set of rules/guidelines in allocating resources to pro-
cesses.
1. A process must request all its required resource and the resources should be allocated before
the process begins its execution.
2. Grant resource allocation requests from processes only if the process does not hold a resource
currently.
Ensure that resource preemption (resource releasing) is possible at operating system level. This
can be achieved by implementing the following set of rules/guidelines in resources allocation and
releasing.
1. Release all the resources currently held by a process if a request made by the process for a new
resource is not able to fulfil immediately.
2. Add the resources which are preempted (released) to a resource list describing the resources
which the process requires to complete its execution.
Reschedule the process for execution only when the process gets its old resources and the new
resource which is requested by the process. Imposing these criterions may introduce negative
impacts like low resource utilization and starvation of processes.
Livelock: The Livelock condition is similar to the deadlock condition except that a process in
livelock condition changes its state with time. While in deadlock a process enters in wait state
for a resource and continues in that state forever without making any progress in the execution, in
a livelock condition a process always does something but is unable to make any progress in the
execution completion. The livelock condition is better explained with the real world example, two
people attempting to cross each other in a narrow corridor. Both the persons move towards each
side of the corridor to allow the opposite person to cross. Since the corridor is narrow, none of
them are able to cross each other. Here both of the persons perform some action but still they are
unable to achieve their target, cross each other. We will make the livelock, the scenario more clear
in a later section—The Dining Philosophers ’ Problem, of this chapter.
Starvation: In the multitasking cont on is the condition in which a process does not get the
resources required to continue its execution for a long time. As time progresses the process starves
on resource. Starvation may arise due to various conditions like byproduct of preventive measures
of deadlock, scheduling policies favoring high priority tasks and tasks with shortest execution
time, etc.
The Dining Philosophers’ Problem: The ’Dining philosophers ’problem’ is an interesting exam-
ple for synchronization issues in resource utilization. The terms ’dining’, ’philosophers’, etc. may
sound awkward in the operating system context, but it is the best way to explain technical things
abstractly using non-technical terms. Now coming to the problem definition:
Five philosophers (It can be ’n’. The number 5 is taken for illustration) are sitting around a round
table, involved in eating and brainstorming. At any point of time each philosopher will be in any
one of the three states: eating, hungry or brainstorming. (While eating the philosopher is not
involved in brainstorming and while brainstorming the philosopher is not involved in eating). For
eating, each philosopher requires 2 forks. There are only 5 forks available on the dining table
(’n’ for ’n’ number of philosophers) and they are arranged in a fashion one fork in between two
philosophers. The philosopher can only use the forks on his/her immediate left and right that too
in the order pickup the left fork first and then the right fork. Analyze the situation and explain the
possible outcomes of this scenario.
Let’s analyze the various scenarios that may occur in this situation.
Scenario 1: All the philosophers involve in brainstorming together and try to eat together. Each
philosopher picks up the left fork and is unable to proceed since two forks are required for eating
the spaghetti present in the plate. Philosopher 1 thinks that Philosopher 2 sitting to the right of
him/her will put the fork down and waits for it. Philosopher 2 thinks that Philosopher 3’ sitting to
the right of him/her will
F IGURE 3.19: Visualization of the ‘Dining Philosophers’ Problem’
put the fork down and waits for it, and so on. This forms a circular chain of un-granted requests.
If the philosophers continue in this state waiting for the fork from the philosopher sitting to the
right of each, they will not make any progress in eating and this will result in starvation of the
philosophers and deadlock.
Scenario 2: All the philosophers start brainstorming together. One of the philosophers is hungry
and he/ she picks up the left fork. When the philosopher is about to pick up the right fork, the
philosopher sitting. to his right also become hungry and tries to grab the left fork which is the
right fork of his neighboring philosopher who is trying to lift it, resulting in a ’Race condition’..
Scenario 3: All the philosophers involve in brainstorming together and by to eat together. Each
philosopher picks up the left fork and is unable to proceed, since two forks are required for eating
the spaghetti present in the plate. Each of them anticipates that the adjacently sitting philosopher
will put his/her fork down and waits for a fixed duration grid after this puts the fork down. Each
of them again tries to lift the fork after a fixed duration of time. Since all philosophers are trying
to lift the fork at the same time, none of them will be able to grab two forks. This condition leads
to livelock and starvation of philosophers, where each philosopher tries to do something, but they
are unable to make any progress in achieving the target. Figure illustrates these scenarios.
Solution: We need to find out alternative solutions to avoid the.deadlock, livelock, racing and
starvation condition that may arise due to the concurrent access of forks by philosophers. This
situation can be handled in many ways by allocating the forks in different allocation techniques
including round Robin allocation, FIFO allocation: etc.
But the requirement is that the solution should be optimal, avoiding deadlock and starvation of the
philosophers and allowing maximum number of philosophers to eat at a time. One solution that
we could think of is:
Imposing rules in accessing the forks by philosophers, like: The philosophers should put down the
fork he/she already have in hand (left fork) after waiting for a fixed duration for the second fork
(right fork) and should wait for a fixed time before making the next attempt.
This solution works fine to some extent, but, if all the philosophers try to lift the forks at the same
time, a livelock situation is resulted.
Another solution which gives maximum concurrency that can be thought of is each philosopher
ac-quires a semaphore (mutex) before picking up any fork. When a philosopher feels hungry
he/she checks whether the philosopher sitting to the left and right of him is already using the fork,
by checking the state of the associated semaphore. If the forks are in use by the neighboring
philosophers, the philosopher waits till the forks are available. A philosopher when finished eating
puts the forks down and informs the philosophers sitting to his/her left and right, who are hungry
(waiting for the forks), by signaling the semaphores associated with the forks.
F IGURE 3.20: ’Real Problems’ in the ’Dining Philosophers ’
We will discuss about semaphores and mutexes at a latter section of this chapter. In the operating
system context, the dining philosophers represent the processes and forks represent the resources.
The dining philosophers’ problem is an analogy of processes competing for shared resources and
the different problems like racing, deadlock, starvation and livelock arising from the competition.
Producer-Consumer/Bounded Buffer Problem: Producer-Consumer problem is a common data

sharing problem where two processes concurrently access a shared buffer with fixed size. A thread-
/process which produces data is called ’Producer thread/process’ and a thread/process which con-
sumes the data produced by a producer thread/process is known as ’Consumer thread/process’.
Imagine a situation where the producer thread keeps on producing data and puts it into the buffer
and the consumer thread keeps on consuming the data from the buffer and there is no synchroniza-
tion between the two. There may be chances where in which the producer produces data at a faster
rate than the rate at which it is consumed by the consumer. This will lead to ’buffer overrun’ where
the producer tries to put data to a full buffer. If the consumer consumes data at a faster rate than the
rate at which it is produced by the producer, it will lead to the situation ‘buffer under-run’ in which
the consumer tries to read from an empty buffer. Both of these conditions will lead to inaccurate
data and data loss. The following code snippet illustrates the producer-consumer problem
Here the ’producer thread’ produces random numbers and puts it in a buffer of size 20. If the
’producer thread’ fills the buffer fully it re-starts the filling of the buffer from the bottom. The
’consumer thread’ consumes the data produced by, the ’producer thread’. For consuming the
data, the ’consumer thread’ reads the buffer which is shared with the ’producer thread’. Once the
’consumer thread’ consumes all the data, it starts consuming the data from the bottom of the buffer.
These two threads run independently and are scheduled for execution based on the scheduling
policies adopted by the OS. The different situations that may arise based on the scheduling of the
’producer thread’ and ’consumer thread’ is listed below.
1. ’Producer thread’ is scheduled more frequently than the ’consumer thread’: There are chances
for overwriting the data in the buffer by the ’producer thread’. This leads to inaccurate data.
2. Consumer thread’ is scheduled more frequently than the ’producer thread’: There are chances
for reading the old data in the buffer again by the ’consumer thread’. This will also lead to inac-
curate data.
The output of the above program when executed on a Windows XP machine is shown in Fig.
10.29. The output shows that the consumer thread runs faster than the producer thread and most
often leads to buffer under-run and thereby inaccurate data. The producer-consumer problem can
be rectified in various methods. One simple solution is the ‘sleep and wake-up’. The ’sleep and
wake-up’ can be implemented in various process synchronization techniques like semaphores,
mutex, monitors, etc. We will discuss it in a latter section of this chapter.
F IGURE 3.21: Output of win32 program illustrating producer-consumer problem
Readers-Writers Problem: Tire Readers-Writers problem is a common issue observed in pro-

cesses competing for limited shared resources. The Readers-Writers problem is characterized by
multiple processes trying to read and write shared data concurrently. A typical real-world example
for the Readers-Writers problem is the banking system where one process tries to read the account
information like available balance and the other process tries to update the available balance for
that account. This may result in inconsistent results. If multiple processes try to read a shared
data concurrently it may not create any impacts, whereas when multiple processes try to write and
read concurrently it will definitely create inconsistent results. Proper synchronization techniques
should be applied to avoid the readers-writers problem. We will discuss about the various syn-
chronization techniques in a later section of this chapter. Priority Inversion: Priority inversion
is the byproduct of the combination of blocking based (lock based) process synchronization and
pre-emptive priority scheduling. ’Priority inversion’ is the condition in which a high priority task
needs to wait for a low priority task to release a resource which is shared between the high pri-
ority task and the low priority task, and a medium priority task which doesn’t require the shared
resource continue its execution by preempting the low priority task. Priority based preemptive
scheduling technique ensures that a high priority task is always executed first, whereas the lock
based process synchronization mechanism (like mutex, semaphore, etc.) ensures that a process
will not access a shared resource, which is currently in use by another process. The synchroniza-
tion technique is only interested in avoiding conflicts that may arise due to the concur-rent access
of the shared resources and not at all bothered about the priority of the process which tries to ac-
cess the shared resource. In fact, the priority based preemption and lock based synchronization
are the two contradicting OS primitives. Priority inversion is better explained with the following
scenario: Let Process A, Process B and Process C be three processes with priorities High, Medium
and Low respectively. Process A and Process C share a variable ’X’ and the access to this variable
is synchronized through a mutual exclusion mechanism like Binary Semaphore S.
F IGURE 3.22: Priority Inversion
Imagine a situation where Process C is ready and is picked up for execution by the scheduler and
’Process C’ tries to access the shared variable ’X’. ’Process C’ acquires the ’Semaphore S’ to
indicate the other processes that it is accessing the shared variable ’X’. Immediately after ’Process
C’ acquires the ’Semaphore S’, ’Process B’ enters the ’Ready’ state. Since ’Process B’ is of higher
priority compared to ’Process C’, ’Process C’ is preempted, and ’Process B’ starts executing. Now
imagine ’Process A’ enters the ’Ready’ state at this stage. Since ’Process A’ is of higher priority
than ’Process B’, ’Process B’ is preempted, and ’Process A’ is scheduled for execution. ’Process
A’ involves accessing of shared variable ’X’ which is currently being accessed by ’Process C’.
Since ’Process C’ acquired the semaphore for signaling the access of the shared variable ’X’,
’Process A’ will not be able to access it. Thus ’Process A’ is put into blocked state (This condition
is called Pending on resource). Now ’Process B’ gets the CPU and it continues its execution until
it relinquishes the CPU voluntarily or enters a wait state or preempted by another high priority
task. The highest priority process ’Process A’ has to wait till ’Process C’ gets a chance to execute
and release the semaphore. This produces unwanted delay in the execution of the high priority
task which is supposed to be executed immediately when it was ’Ready’. Priority inversion may
be sporadic in nature but can lead to potential damages as a result f missing critical deadlines.
Literally speaking, priority inversion ’inverts’ the priority of a high priority task with that of a low
priority task. Proper workaround mechanism should be adopted for handling the priority inversion
problem. The commonly adopted priority inversion workarounds are:
through a mutual exclusion mechanism like Binary Semaphore S. Imagine a situation where Pro-
cess C is ready and is picked up for execution by the scheduler and ’Process C’ tries to access the
shared variable ’X’. ’Process C’ acquires the ’Semaphore S’ to indicate the other processes that it
is accessing the shared variable ’X’. Immediately after ’Process C’ acquires the ’Semaphore S’,
’Process B’ enters the ’Ready’ state. Since ’Process B’ is of higher priority compared to ’Process
C’, ’Process C’ is preempted, and ’Process B’ starts executing. Now imagine ’Process A’ enters
the ’Ready’ state at this stage. Since ’Process A’ is of higher priority than ’Process B’, ’Process
B’ is preempted, and ’Process A’ is scheduled for execution. ’Process A’ involves accessing of
shared variable ’X’ which is currently being accessed by ’Process C’. Since ’Process C’ acquired
the semaphore for signaling the access of the shared variable ’X’, ’Process A’ will not be able
to access it. Thus ’Process A’ is put into blocked state (This condition is called Pending on re-
source). Now ’Process B’ gets the CPU and it continues its execution until it relinquishes the CPU
voluntarily or enters a wait state or preempted by another high priority task. The highest priority
process ’Process A’ has to wait till ’Process C’ gets a chance to execute and release the semaphore.
This produces unwanted delay in the execution of the high priority task which is supposed to be
executed immediately when it was ’Ready’. Priority inversion may be sporadic in nature but can
lead to potential damages as a result f missing critical deadlines. Literally speaking, priority inver-
sion ’inverts’ the priority of a high priority task with that of a low priority task. Proper workaround
mechanism should be adopted for handling the priority inversion problem. The commonly adopted
priority inversion workarounds are:
Priority Inheritance: A low-priority task that is currently accessing (by holding the lock) a shared
resource requested by a high-priority task temporarily ’inherits’ the priority of that high-priority
task, from the moment the high-priority task raises the request. Boosting the priority of the low
priority task to that of the priority of the task which requested the shared resource holding by the
low priority task eliminates the preemption of the low priority task by other tasks whose priority
are below that of the task requested the shared resource ’and thereby reduces the delay in waiting
to get the resource requested by the high priority task. The priority of the low priority task which
is temporarily boosted to high is brought to the original value when it releases the shared resource.
Implementation of Priority inheritance workaround in the priority inversion problem discussed for
Process A, Process B and Process C example will change the execution sequence as shown in
Figure.
F IGURE 3.23: Priority Inheritance
Priority inheritance is only a work around and it will not eliminate the delay in waiting the high
priority task to get the resource from the low priority task. The only thing is that it helps the low
priority task to continue its execution and release the shared resource as soon as possible. The
moment, at which the low priority task releases the shared resource, the high priority task kicks
the low priority task out and grabs the CPU - A true form of selfishness. Priority inheritance
handles priority inversion at the cost of run-time overhead at scheduler. It imposes the overhead of
checking the priorities of all tasks which tries to access shared resources and adjust the priorities
dynamically.
Priority Ceiling: In ’Priority Ceiling’, a priority is associated with each shared resource. The
priority associated to each resource is the priority of the highest priority task which uses this
shared resource. This priority level is called ’ceiling priority’. Whenever a task accesses a shared
resource, the scheduler elevates the priority of the task to that of the ceiling priority of the resource.
If the task which accesses the shared resource is a low priority task, its priority is temporarily
boosted to the priority of the highest priority task to which the resource is also shared. This
eliminates the pre-emption of the task by other medium priority tasks leading to priority inversion.
The priority of the task is brought back to the original level once the task completes the accessing
of the shared resource. ’Priority Ceiling’ brings the added advantage of sharing resources without
the need for synchronization techniques like locks. Since the priority of the task accessing a
shared resource is boosted to the highest priority of the task among which the resource is shared,
the concurrent access of shared resource is automatically handled. Another advantage of ’Priority
Ceiling’ technique is that all the overheads are at compile time instead of run-time. Implementation
of ’priority ceiling’ workaround in the priority inversion problem discussed for Process A, Process
B and Process C example will change the execution sequence as shown in Figure.
F IGURE 3.24: Handling Priority Inversion problem with priority Ceiling
The biggest drawback of ’Priority Ceiling’ is that it may produce hidden priority inversion. With
’Priority Ceiling’ technique, the priority of a task is always elevated no matter another task wants
the shared resources. This unnecessary priority elevation always boosts the priority of a low pri-
ority task to that of the highest priority tasks among which the resource is shared and other tasks
with priorities higher than that of the low priority task is not allowed to preempt the low priority
task when it is accessing a shared resource. This always gives the low priority task the luxury of
running at high priority when accessing shared resources.
Task Synchronization Techniques
So far we discussed about the various task/process synchronization issues encountered in multi-
tasking systems due to concurrent resource access. Now let’s have a discussion on the various
techniques used for synchronization in concurrent access in multitasking. Process/Task synchro-
nization is essential for
1. Avoiding conflicts in resource access (racing, deadlock, starvation, livelock, etc.) in a multi-
tasking environment.
2. Ensuring proper sequence of operation across processes: The producer consumer problem is
a typical example for processes requiring proper sequence of operation. In producer consumer
problem, accessing the shared buffer by different processes is not the issue; the issue is the writing
process should write to the shared buffer only if the buffer is not full and the consumer thread
should not read from the buffer if it is empty. Hence proper synchronization should be provided to
implement this sequence of operations.
3. Communicating between processes: The code memory area which holds the program instruc-
tions (piece of code) for accessing a shared resource (like shared memory, shared variables, etc.)
is known as ’critical section’. In order to synchronize the access to shared resources, the access to
the critical section should be exclusive. The exclusive access to critical section of code is provided
through mutual exclusion mechanism. Let us have a look at how mutual exclusion is important
in concurrent access. Consider two processes Process A and Process B running on a multitasking
system. Process A is currently running and it enters its critical section. Before Process A com-
pletes its operation in the critical section, the scheduler preempts Process A and schedules Process
B for execution (Process B is of higher priority compared to Process A). Process B also contains
the access to the critical section which is already in use by Process A. If Process B continues its
execution and enters the critical section which is already in use by Process A, a racing condition
will be resulted. A mutual exclusion policy enforces mutually exclusive access of critical sec-
tions. Mutual exclusions can be enforced in different ways. Mutual exclusion blocks a process.
Based on the behaviour of the blocked process, mutual exclusion methods can be classified into
two categories. In the following section we will discuss them in detail.
Mutual Exclusion through Busy Waiting/Spin Lock: ’Busy waiting’ is the simplest method for
enforcing mutual exclusion. The following code snippet illustrates how ’Busy waiting’ enforces
mutual exclusion.
The ’Busy waiting’ technique uses a lock variable for implementing mutual exclusion. Each pro-
cess/ thread checks this lock variable before entering the critical section. The lock is set to ’1’ by
a process/ thread if the process/thread is already in its critical section; otherwise the lock is set
to ’0’. The major challenge in implementing the lock variable based synchronization is the non-
availability of a single atomic instruction which combines the reading, comparing and setting of
the lock variable. Most often the three different operations related to the locks, viz. the operation
of Reading the lock variable, checking its present value, and setting it are achieved with multiple
low-level instructions. The low-level implementation of these operations are dependent on the un-
derlying processor instruction set and the (cross) compiler in use. The low-level implementation of
the ’Busy waiting’ code snippet, which we discussed earlier, under Windows XP operating system
running on an Intel Centrino Duo processor is given below. The code snippet is compiled with
Microsoft Visual Studio 6.0 compiler.
The assembly language instructions reveals that the two high level instructions (while(bFlag==false);
and bFlag=true;), corresponding to the operation of reading the lock variable, checking its present
value and setting it is implemented in the processor level using six low level instructions. Imagine
a situation where ‘Process 1’ read the lock variable and tested it and found that the lock is available
and it is about to set the lock for acquiring the critical section. But just before ’Process 1’ sets the
lock variable, ’Process 2’ preempts ’Process 1’ and starts executing. ’Process 2’ contains a critical
section code and it tests the lock variable for its availability. Since ’Process 1’ was unable to set
the lock variable, its state is still ’0’ and ’Process 2’ sets it and acquires the critical section. Now
the scheduler preempts ’Process 2’ and schedules ’Process 1’ before ’Process 2’ leaves the criti-
cal section. Remember, ‘Process l’ was preempted at a point just before setting the lock variable
(‘Process 1’ has already tested the lock variable just before it is preempted and found that the lock
is available). Now ’Process 1’ sets the lock variable and enters the critical section. It violates the
mutual exclusion policy and may pro-duce unpredicted results.
Device Driver
Device driver is a piece of software that acts as a bridge between the operating system and the
hardware. In an operating system based product architecture, the user applications talk to the
Operating System kernel for all necessary information exchange including communication with
the hardware peripherals. The architecture of the OS kernel will not allow direct device access
from the user application. All the device related access should flow through the OS kernel and
the OS kernel mutes it to the concerned hardware peripheral. OS provides interfaces in the form
of Application Programming Interfaces (APIs) for accessing the hardware. The device driver
abstracts the hardware from user applications. The topology of user applications and hardware
interaction in an RTOS based system is depicted in Fig.
F IGURE 3.25: Role of device driver in Embedded OS based products
Device drivers are responsible for initiating and managing the communication with the hardware
peripherals. They are responsible for establishing the connectivity, initializing the hardware (set-
ting up various registers of the hardware device) and transferring data. An embedded product may
contain different types of hardware components like Wi-Fi module, File systems, Storage device
interface, etc. The initialization of these devices and the protocols required for communicating
with these devices may be different. All these requirements are implemented in drivers and a sin-
gle driver will not be able to satisfy all these. Hence each hardware (more specifically each class
of hardware) requires a unique driver component. Certain drivers come as part of the OS kernel
and certain drivers need to be installed on the fly. For example, the program storage memory for
an embedded product, say NAND Flash memory requires a NAND Flash driver to read and write
data from/to it. This driver should come as part of the OS kernel image. Certainly the OS will not
contain the drivers for all devices and peripherals under the Sun. It contains only the necessary
drivers to communicate with the onboard devices (Hardware devices which are part of the plat-
form) and for certain set of devices supporting standard protocols and device class (Say USB Mass
storage device or HID devices like Mouse/keyboard). If an external device, whose driver software
is not available with OS kernel image, is connected to the embedded device (Say a medical device
with custom USB class implementation is connected to the USB port of the embedded product),
the OS prompts the user to install its driver manually. Device drivers which are part of the OS
image are known as ’Built-in drivers’ or ’On-board drivers’. These drivers are loaded by the OS
at the time of booting the device and are always kept in the RAM. Drivers which need to be in-
stalled for accessing a device are known. as ’Installable drivers’. These drivers are loaded by the
OS on a need basis. Whenever the device is connected, the OS loads the corresponding driver to
memory. When the device is removed, the driver is unloaded from memory. The Operating system
maintains a record of the drivers corresponding to each hardware.
The implementation of driver is OS dependent. There is no universal implementation for a driver.

How the driver communicates with the kernel is dependent on the OS structure and implementa-
tion. Different Operating Systems follow different implementations.
It is very essential to know the hardware interfacing details like the memory address assigned to
the device, the Interrupt used, etc. of on-board peripherals for writing a driver for that peripheral.
It varies on the hardware design of the product. Some Real-Time operating systems like ’Windows
CE’ support a layered architecture for the driver which separates out the low level implementation
from the OS specific interface. The low level implementation part is generally known as Platform
Dependent Device (PDD) layer. The OS specific interface part is known as Model Device Driver
(MDD) or Logical Device Driver (LDD). For a standard driver, for a specific operating system, the
MDD/LDD always remains the same and only the PDD part needs to be modified according to the
target hardware for a particular class of devices.
Most of the time, the hardware developer provides the implementation for all on board devices for a
specific OS along with the platform. The drivers are normally shipped in the form of Board Support
Package. The Board Support Package contains low level driver implementations for the onboard
peripherals and OEM Adaptation Layer (OAL) for accessing the various chip level functionalities
and a bootloader for loading the operating system. The OAL facilitates communication between
the Operating System (OS) and the target device and includes code to handle interrupts, timers,
power management, bus abstraction; generic I/O control codes (IOCTLs), etc. The driver files are
usually in the form of a dll file. Drivers can run on either user space or kernel space. Drivers
which run in user space are known as user mode drivers and the drivers which run in kernel space
are known as kernel mode drivers. User mode drivers are safer than kernel mode drivers. If an
error or exception occurs in a user mode driver, it won’t affect the services of the kernel. On the
other hand, if an exception occurs in the kernel mode driver, it may lead to the kernel crash. The
way how a device driver is written and how the interrupts are handled in it are operating system
and target hardware specific. However regardless of the OS types, a device driver implements the
following:
1. Device (Hardware) Initialization and Interrupt configuration
2. Interrupt handling and processing
3. Client interfacing (Interfacing with user applications)
The Device (Hardware) initialisation part of the driver deals with configuring the different registers
of the device (target hardware). For example configuring the I/O port line of the processor as Input
or output line and setting its associated registers for building a General Purpose IO (GPIO) driver.
The interrupt configuration part deals with configuring the interrupts that needs to be associated
with the hardware. In the case of the GPIO driver, if the intention is to generate an interrupt
when the Input line is asserted, we need to configure the interrupt associated with the I/O port by
modifying its associated registers. The basic Interrupt configuration involves the following.
1. Set the interrupt type (Edge Triggered (Rising/Flailing) or Level Triggered (Low or High)),
enable the interrupts and set the interrupt priorities.
2. Bind the Interrupt with an Interrupt Request (IRQ). The processor identifies an interrupt through
IRQ. These IRQs are generated by the Interrupt Controller. In order to identify an interrupt the
interrupt needs to be bonded to an IRQ.
3. Register an Interrupt Service Routine (ISR) with an Interrupt Request (IRQ). ISR is the han-
dler for an Interrupt. In order to service an interrupt, an ISR should be associated with an IRQ.
Registering an ISR with an IRQ takes care of it.
With these the interrupt configuration is complete. If an interrupt occurs, depending on its pri-
ority, it is serviced and the corresponding ISR is invoked. The processing part of an interrupt is
handled in an ISR. The whole interrupt processing can be done by the ISR itself or by invoking
an Interrupt Service Thread (IST). The IST performs interrupt processing on behalf of the ISR. To
make the ISR compact and short, it is always advised to use an IST for interrupt processing. The
intention of an interrupt is to send or receive command or data to and from the hardware device
and make the received data available to user programs for application specific processing. Since
interrupt processing happens at kernel level, user applications may not have direct access to the
drivers to pass and receive data. Hence it is the responsibility of the Interrupt processing routine or
thread to inform the user applications that au interrupt is occurred and data is available for further
processing. The client interfacing part of the device driver takes care of this. The client inter-
facing implementation makes use of the Inter Process communication mechanisms supported by
the embedded OS for communicating and synchronising with user applications and drivers. For
example, to inform a user application that an interrupt is occurred and the data received from the
device is placed in I shared buffer, the client interfacing code can signal (or set) an event. The user
application creates the event, registers it and waits for the driver to signal it. The driver can share
the received data through shared memory techniques. IOCTLs, shared buffers, etc. can be used
for data sharing. The story line is incomplete without performing an interrupt done (Interrupt pro-
cessing completed) functionality in the driver. Whenever an interrupt is asserted, while vectoring
to its corresponding ISR, all interrupts of equal and low’5riorities are disabled. They are re-enable
only on executing the interrupt done function (Same as the Return from Interrupt RETI instruction
execution for 8051) by the driver.
Chapter 4
EMBEDDED SOFTWARE
DEVELOPMENT TOOLS
Course Outcomes
CO 4 Make use of embedded software development tools for debugging and Apply
testing of embedded applications.
4.1 HOST AND TARGET MACHINES:
• Host: – A computer system on which all the programming tools run – Where the embedded
software is developed, compiled, tested, debugged, optimized, and prior to its translation into
target device. • Target: – After writing the program, compiled, assembled and linked, it is moved
to target – After development, the code is cross-compiled, translated – cross-assembled, linked
into target processor instruction set and located into the target.
170
Chapter 4. EMBEDDED SOFTWARE DEVELOPMENT TOOLS 171
F IGURE 4.1: Difference between host and target system
Cross Compilers:
• A cross compiler that runs on host system and produces the binary instructions that will be un-
derstood by your target microprocessor.
• A cross compiler is a compiler capable of creating executable code for a platform other than the
one on which the compiler is running. For example, a compiler that runs on aWindows 7 PC but
generates code that runs on Android smartphone is a cross compiler.
• Most desktop systems are used as hosts come with compilers, assemblers, linkers that will run
on the host. These tools are called native tools.
• Suppose the native compiler on a Windows NT system is based on Intel Pentium. This compiler
may possible if target microprocessor is also Intel Pentium. This is not possible if the target mi-
croprocessor is other than Intel i.e. like MOTOROLA, Zilog etc.
• A cross compiler that runs on host system and produces the binary instructions that will be un-
derstood by your target microprocessor. This cross compiler is a program which will do the above
task. If we write C/C++ source code that could compile on native compiler and run on host, we
could compile the same source code through cross compiler and make run it run on target also.
• That may not possible in all the cases since there is no problem with if, switch and loops state-
ments for both compilers but there may be an error with respect to the following: • In Function
declarations
• The size may be different in host and target
• Data structures may be different in two machines.
• Ability to access 16 and 32 bit entries reside at two machines.
Sometimes cross compiler may warn an error which may not be warned by native complier.
4.1.1 Cross Assemblers and Tool Chains:
• Cross assembling is necessary if target system cannot run an assembler itself.

• A cross assembler is a program that runs on host produces binary instructions appropriate for the
target. The input to the cross assembler is assembly language file (.asm file) and output is binary
file.
• A cross-assembler is just like any other assembler except that it runs on some CPU other than the
one for which it assembles code.
4.1.2 Tool chain for building embedded software shown below:
F IGURE 4.2: The figure shows the process of building software for an embedded system.
As you can see in figure the output files from each tool become the input files for the next. Because
of this the tools must be compatible with each other.
A set of tools that is compatible in this way is called tool chain. Tool chains that run on various
hosts and builds programs for various targets.
4.2 LINKER/LOCATORS FOR EMBEDDED SOFTWARE:
Linker:
– a linker or link editor is a computer program that takes one or more object files generated by a
compiler and combines them into a single executable file, library file, or another object file.
Locator:
• locate embedded binary code into target processors

• produces target machine code (which the locator glues into the RTOS) and the combined code
(called map) gets copied into the target ROM
4.2.1 Linking Process shown below:
F IGURE 4.3: An illustration of linking process.

• The native linker creates a file on the disk drive of the host system that is read by a part of
operating system called the loader whenever the user requests to run the programs.
• The loader finds memory into which to load the program, copies the program from the disk into
the memory
• Address Resolution
4.2.2 Native Tool Chain:
F IGURE 4.4: Native tool chain
Explanation for above native tool chain figure:
• Above Figure shows the process of building application software with native tools. One problem
in the tool chain must solve is that many microprocessor instructions contain the addresses of their
operands.
• the above figure MOVE instruction in ABBOTT.C will load the value of variable idunno into
register R1 must contain the address of the variable. Similarly CALL instruction must contain the
address of the whosonfirst. This process of solving problem is called address resolution.
• When abbott.c file compiling,the compiler does not have any idea what the address of idunno
and whosonfirst() just it compiles both separately and leave them as object files for linker.
• Now linker will decide that the address of idunno must be patched to whosonfirst() call instruc-
toin. When linker puts the two object files together, it figures out idunno and whosonfirst() are in
relation for execution and places in executable files.
• After loader copies the program into memory and exactly knows where idunno and whosonfirst()
are in memory. This whole process called as address resolution.
4.3 Output File Formats:
In most embedded systems there is no loader, when the locator is done then output will be copied
to target.
Therefore the locator must know where the program resides and fix up all memories.
Locators have mechanism that allows you to tell them where the program will be in the target
system. Locators use any number of different output file formats.
The tools you are using to load your program into target must understand whatever file format your
locator produces.
1. intel Hex file format
2. Motorola S-Record format
4.3.1 Intel Hex file format:
below figure shows Intel Hex file format
F IGURE 4.5: Intel Hex file format

4.3.2 Motorola S-Record format
below figure shows Intel Hex file format
F IGURE 4.6: Motorola S-Record format
4.3.3 Loading program components properly:
Another issue that locators must resolve in the embedded environment is that some parts of the
program need to end up in the ROM and some parts need to end up in RAM.
For example whosonfirst() end up in ROM and must be remembered even power is off. The vari-
able idunno would have to be in RAM, since it data may be changed.
This issue does not arise with application programming, because the loader copies the entire pro-
gram into RAM.
Most tools chains deal with this problem by dividing the programs into segments. Each segment
is a piece of program that the locator can place it in memory independently of other segments.
Segments solve other problems like when processor power on, embedded system programmer
must ensure where the first instruction is at particular place with the help of segments.
F IGURE 4.7: How the tool chain uses segments
Figure shows how a tool chain might work in a system in hypothetical system that contains three
modules X.c, Y.c and Z.asm.The code X.c contains some instructions, some uninitialized data and
some constant strings. The Y. c contains some instructions, some uninitialized and some initialized
data. The Z.asm contains some assembly language function, start up code and uninitialized code
.The cross compiler will divide X.c into 3 segments in the object file
First segment: code
Second segment: udata
Third segment: constant strings
• The cross compiler will divide Y.c into 3 segments in the object file First segment: code Second
segment: udata Third segment: idata
• The cross compiler Z.asm divides the segments into First Segment: assembly language functions
Second Segment: start up code
Third Segment t: udata
The linker/ Locator reshuffle these segments and places Z.asm start up code at where processor
begins its execution, it places code segment in ROM and data segment in RAM. Most compilers
automatically divide the module into two or more segments: The instructions (code), uninitialized
code, Initialized, Constant strings. Cross assemblers also allow you to specify the segment or
segments into which the output from the assembler should be placed. Locator places the segments
in memory. The following two lines of instructions tells one commercial locator how to build the
program.
• The –Z at the beginning of each line indicates that this line is a list of segments. • At the end of
F IGURE 4.8: Locator places segments in memory
each line is the address where the segment should be placed.

• The locator places the segments one after other in memory, starting with the given address.
• The segments CSTART, IVECS, CODE one after other must be placed at address 0.
• The segments IDATA, UDATA AND CTACK at address at 8000.
Some other features of locators are:
• We can specify the address ranges of RAM and ROM, the locator will warn you if program does
not fit within those functions.
• We can specify the address at which the segment is to end, then it will place the segment below
that address which is useful for stack memory.
• We can assign each segment into group, and then tell the locator where the group go and deal
with individual segments.
Where the variable ifreq must be stored. In the above code, in the first case ifreq the initial value
must reside in the ROM (this is the only memory that stores the data while the power is off).In the
second case the ifreq must be in RAM, because setfreq () changes it frequently.
The only solution to the problem is to store the variable in RAM and store the initial value in ROM
and copy the initial value into the variable at startup. Loader sees that each initialized variable has
the correct initial value when it loads the program. But there is no loader in embedded system, so
that the application must itself arrange for initial values to be copied into variables.
The locator deals with this is to create a shadow segment in ROM that contains all of the initial
values, a segment that is copied to the real initialized - data segment at start up. When an embedded
system is powdered on the contents of the RAM are garbage. They only become all zeros if some
start up code in the embedded system sets them as zeros.
4.4 Locator Maps:
• Most locators will create an output file, called map, that lists where the locator placed each of the
segments in memory.
• A map consists of address of all public functions and global variables.
• These are useful for debugging an ‘advanced’ locator is capable of running a startup code in
ROM, which load the embedded code from ROM into RAM to execute quickly since RAM is
faster
Locator MAP IS SHOWN BELOW:
4.4.1 Executing out of RAM:
RAM is faster than ROM and other kinds of memory like flash. The fast microprocessors (RISC)
execute programs rapidly if the program is in RAM than ROM. But they store the programs in
ROM, copy them in RAM when system starts up.
The start-up code runs directly from ROM slowly. It copies rest of the code in RAM for fast
processing. The code is compressed before storing into the ROM and start up code decompresses
when it copies to RAM.
The system will do all this things by locator, locator must build program can be stored at one
collection of address ROM and execute at other collection of addresses at RAM. Getting embedded
F IGURE 4.9: Locator MAP
software into the target system:

• The locator will build a file as an image for the target software. There are few ways to getting
the embedded software file into target system.
– PROM programmers
– ROM emulators
– In circuit emulators
– Flash
– Monitors
PROM Programmers:
• The classic way to get the software from the locator output file into target system by creating file
in ROM or PROM.
• Creating ROM is appropriate when software development has been completed, since cost to build
ROMs is quite high. Putting the program into PROM requires a device called PROM programmer
device.
• PROM is appropriate if software is small enough, if you plan to make changes to the software
and debug. To do this, place PROM in socket on the Target than being soldered directly in the
circuit (the following figure shows). When we find bug, you can remove the PROM containing
the software with the bug from target and put it into the eraser (if it is an erasable PROM) or into
the waste basket. Otherwise program a new PROM with software which is bug fixed and free,
and put that PROM in the socket. We need small tool called chip puller (inexpensive) to remove
PROM from the socket. We can insert the PROM into socket without any tool than thumb (see
figure8). If PROM programmer and the locator are from different vendors, its upto us to make
them compatible.
F IGURE 4.10: Semantic edge view of socket
4.5 ROM Emulators:
Other mechanism is ROM emulator which is used to get software into target. ROM emulator is a
device that replaces the ROM into target system. It just looks like ROM, as shown figure9; ROM
emulator consists of large box of electronics and a serial port or a network connection through
which it can be connected to your host. Software running on your host can send files created by
the locator to the ROM emulator. Ensure the ROM emulator understands the file format which the
locator creates.
Test System
4.6 In circuit emulators:
If we want to debug the software, then we can use overlay memory which is a common feature of
in-circuit emulators. In-circuit emulator is a mechanism to get software into target for debugging
purposes.
4.6.1 Flash:
If your target stores its program in flash memory, then one option you always have is to place
flash memory in socket and treat it like an EPROM .However, If target has a serial port, a network
connection, or some other mechanism for communicating with the outside world, link then target
can communicate with outside world, flash memories open up another possibility: you can write
a piece of software to receive new programs from your host across the communication link and
write them into the flash memory. Although this may seem like difficult
The reasons for new programs from host:
• You can load new software into your system for debugging, without pulling chip out of socket
and replacing.
• Downloading new software is fast process than taking out of socket, programming and returning
into the socket.
• If customers want to load new versions of the software onto your product.
The following are some issues with this approach:
• Here microprocessor cannot fetch the instructions from flash.
• The flash programming software must copy itself into the RAM, locator has to take care all these
activities how those flash memory instructions are executing.
• We must arrange a foolproof way for the system to get flash programming software into the target
i.e target system must be able to download properly even if earlier download crashes in the middle.
• To modify the flash programming software, we need to do this in RAM and then copy to flash.
4.6.2 Monitors:
It is a program that resides in target ROM and knows how to load new programs onto the system.
A typical monitor allows you to send the data across a serial port, stores the software in the target
RAM, and then runs it. Sometimes monitors will act as locator also, offers few debugging services
like setting break points, display memory and register values. You can write your own monitor
program.
4.7 DEBUGGING TECHNIQUES
I. Testing on host machine

II. using laboratory tools
III. an example system
Introduction:
While developing the embedded system software, the developer will develop the code with the lots
of bugs in it. The testing and quality assurance process may reduce the number of bugs by some
factor. But only the way to ship the product with fewer bugs is to write software with few fewer
bugs. The world extremely intolerant of buggy embedded systems. The testing and debugging
will play a very important role in embedded system software development process.
4.7.1 Testing on host machine :
• Goals of Testing process are

– Find bugs early in the development process
– Exercise all of the code
– Develop repeatable , reusable tests
– Leave an audit trail of test results
Find the bugs early in the development process:
This saves time and money. Early testing gives an idea of how many bugs you have and then how
much trouble you are in.
BUT: the target system is available early in the process, or the hardware may be buggy and unsta-
ble, because hardware engineers are still working on it.
Exercise all of the code:
Exercise all exceptional cases, even though, we hope that they will never happen, exercise them
and get experience how it works.
BUT: It is impossible to exercise all the code in the target. For example, a laser printer may have
code to deal with the situation that arise when the user presses the one of the buttons just as a paper
jams, but in the real time to test this case. We have to make paper to jam and then press the button
within a millisecond, this is not very easy to do.
Develop reusable, repeatable tests:
It is frustrating to see the bug once but not able to find it. To make refuse to happen again, we need
to repeatable tests.
BUT: It is difficult to create repeatable tests at target environment.
Example: In bar code scanner, while scanning it will show the pervious scan results every time,
the bug will be difficult to find and fix.
Leave an “Audit trail” of test result:
Like telegraph “seems to work” in the network environment as it what it sends and receives is not
easy as knowing, but valuable of storing what it is sending and receiving.BUT: It is difficult to
keep track of what results we got always, because embedded systems do not have a disk drive.
Conclusion:
Don’t test on the target, because it is difficult to achieve the goals by testing software on target
system. The alternative is to test your code on the host system.
Basic Technique to Test:
The following figure shows the basic method for testing the embedded software on the develop-
ment host. The left hand side of the figure shows the target system and the right hand side shows
how the test will be conducted on the host. The hardware independent code on the two sides of the
figure is compiled from the same source. The hardware and hardware dependent code has been re-
F IGURE 4.11: Test System
placed with test scaffold software on the right side. The scaffold software provides the same entry
points as does the hardware dependent code on the target system, and it calls the same functions
in the hardware independent code. The scaffold software takes its instructions from the keyboard
or from a file; it produces output onto the display or into the log file.
Conclusion: Using this technique you can design clean interface between hardware independent
software and rest of the code.
Calling Interrupt Routines by scaffold code:
Based on the occurrence of interrupts tasks will be executed. Therefore, to make the system do
anything in the test environment, the test scaffold must execute the interrupt routines. Interrupts
have two parts one which deals with hardware (by hardware dependent interrupt calls) and other
deals rest of the system (hardware independent interrupt calls).
Calling the timer interrupt routine:
One interrupt routine your test scaffold should call is the timer interrupt routine. In most embedded
systems initiated the passage of time and timer interrupt at least for some of the activity. You could
have the passage of time in your host system call the timer interrupt routine automatically. So time
goes by your test system without the test scaffold software participation. It causes your test scaffold
to lose control of the timer interrupt routine. So your test scaffold must call Timer interrupt routine
directly.
Script files and Output files:
A test scaffold that calls the various interrupt routines in a certain sequence and with certain data.
A test scaffold that reads a script from the keyboard or from a file and then makes calls as directed
by the script. Script file may not be a project, but must be simple one. Example: script file to test
the bar code scanner
#frame arrives
# Dst Src Ctrl mr/56 ab
#Backoff timeout expires Kt0
#timeout expires again Kt0
#sometime pass Kn2
#Another beacon frame arrives
Each command in this script file causes the test scaffold to call one of the interrupts in the hard-
ware independent part.
In response to the kt0 command the test scaffold calls one of the timer interrupt routines. In re-
sponse to the command kn followed by number, the test scaffold calls a different timer interrupt
routine the indicated number of times. In response to the command mr causes the test scaffold to
write the data into memory.
Features of script files:
• The commands are simple two or three letter commands and we could write the parser more
quickly.
• Comments are allowed, comments script file indicate what is being tested, indicate what results
you expect, and gives version control information etc.
• Data can be entered in ASCII or in Hexadecimal.
Most advanced Techniques:
These are few additional techniques for testing on the host. It is useful to have the test scaffold
software do something automatically. For example, when the hardware independent code for the
underground tank monitoring system sends a line of data to the printer, the test scaffold software
must capture the line, and it must call the printer interrupt routine to tell the hardware independent
code that the printer is ready for the next line.
There may be a need that test scaffold a switch control because there may be button interrupt
routine, so that the test scaffold must be able to delay printer interrupt routine. There may be
low, medium, high priority hardware independent requests, then scaffold switches as they appear.
Some Numerical examples of test scaffold software: In Cordless bar code scanner, when H/W in-
dependent code sends a frame the scaffold S/W calls the interrupt routine to indicate that the frame
has been sent. When H/W independent code sets the timer, then test scaffold code call the timer
interrupt after some period. The scaffold software acts as communication medium, which contains
multiple instances of H/W independent code with respect to multiple systems in the project.
Bar code scanner Example:
Here the scaffold software generate an interrupts when ever frame send and receive. Bar code
Scanner A send data frame, captures by test scaffold and calls frame sent interrupt. The test
scaffold software calls receive frame interrupt when it receives frame. When any one of the H/W
independent code calls the function to control radio, the scaffold knows which instances have
turned their radios, and at what frequencies. Targets that have their radios turned off and tuned
F IGURE 4.12: Test scaffold for the barcode scanner software
to different frequencies do not receive the frame. The scaffold simulates the interference that
prevents one or more stations from receiving the data. In this way the scaffold tests various pieces
of software communication properly with each other or not.(see the above figure).
OBJECTIONS, LIMITATIONS AND SHORT COMINGS:
Engineers raise many objections to testing embedded system code on their host system, Because
many embedded systems are hardware dependent. Most of the code which is tested at host side is
hardware dependent code.

To test at host side embedded systems interacts only with the microprocessor, has no direct contact
with the hardware. As an example the Telegraph software huge percentage of software is hard-
ware independent i.e. this can be tested on the host with an appropriate scaffold. There are few
objections to scaffold: Building a scaffold is more trouble, making compatible to RTOS is other
tedious job.
4.8 Using laboratory Tools:
• Volt meters and Ohm Meters

• Oscilloscopes
• Logic Analyzers
• Logic Analyzers in Timing mode
• Logic Analyzers in State Mode
• In-circuit Emulators
• Getting “ Visibility” into the Hardware
• Software only Monitors
• Other Monitors
4.8.1 Volt meters:
Volt meter is for measuring the voltage difference between two points. The common use of volt-
meter is to determine whether or not chip in the circuit have power. A system can suffer power
failure for any number of reasons- broken leads, incorrect wiring, etc. the usual way to use a volt
meter It is used to turn on the power, put one of the meter probes on a pin that should be attached
to the VCC and the other pin that should be attached to ground. If volt meter does not indicate the
correct voltage then we have hardware problem to fix.
4.8.2 Ohm Meters:
Ohm meter is used for measuring the resistance between two points, the most common use of Ohm
meter is to check whether the two things are connected or not. If one of the address signals from
microprocessors is not connected to the RAM, turn the circuit off, and then put the two probes
on the two points to be tested, if ohm meter reads out 0 ohms, it means that there is no resistance
between two probes and that the two points on the circuit are therefore connected. The product
commonly known as Multimeter functions as both volt and Ohm meters.
4.8.3 Oscilloscopes:
It is a device that graphs voltage versus time, time and voltage are graphed horizontal and vertical
axis respectively. It is analog device which signals exact voltage but not low or high.
Features of Oscilloscope:
• You can monitor one or two signals simultaneously.

• You can adjust time and voltage scales fairly wide range.
• You can adjust the vertical level on the oscilloscope screen corresponds to ground. With the use
of trigger, oscilloscope starts graphing. For example we can tell the oscilloscope to start graphing
when signal reaches 4.25 volts and is rising.
Oscilloscopes extremely useful for Hardware engineers, but software engineers use them for the
following purposes:
1. Oscilloscope used as volt meter, if the voltage on a signal never changes, it will display hori-
zontal line whose location on the screen tells the voltage of the signal.
2. If the line on the Oscilloscope display is flat, then no clocking signal is in Microprocessor and
it is not executing any instructions.
3. Use Oscilloscope to see as if the signal is changing as expected.
4. We can observe a digital signal which transition from VCC to ground and vice versa shows there
is hardware bug. Figure is a sketch of a typical oscilloscope, consists of probes used to connect
the oscilloscope to the circuit. The probes usually have sharp metal ends holds against the signal
F IGURE 4.13: Typical Oscilloscope
on the circuit. Witch’s caps fit over the metal points and contain little clip that hold the probe in
the circuit. Each probe has ground lead a short wire that extends from the head of the probe, it
can easily attach to the circuit. It is having numerous adjustment knobs and buttons allow you to
control. Some may have on screen menus and set of function buttons along the side of the screen.
F IGURE 4.14: A Reasonable clock signal

F IGURE 4.15: A Questionable clock signal
F IGURE 4.16: A dead clock signal
4.8.4 Logic Analyzers:
This tool is similar to oscilloscope, which captures signals and graphs them on its screen. But it
differs with oscilloscope in several fundamental ways
• A logic analyzer track many signals simultaneously.
• The logic analyzer only knows 2 voltages, VCC and Ground. If the voltage is in between VCC
and ground, then logical analyzer will report it as VCC or Ground but not like exact voltage.
• All logic analyzers are storage devices. They capture signals first and display them later.
F IGURE 4.17: A ROM chip selection signal
• Logic analyzers have much more complex triggering techniques than oscilloscopes.
• Logical analyzers will operate in state mode as well as timing mode.
4.8.5 Logical analyzers in Timing Mode:
Some situations where logical analyzers are working in Timing mode

• If certain events ever occur.
• Example: In bar code scanner software ever turns the radio on, we can attach logic analyzer to
the signals that controls the power to the radio.
• We can measure how long it takes for software to respond.
• We can see software puts out appropriate signal patterns to control the hardware. The under-
ground tank monitoring system to find out how long it will takes the software to turn off the bell
when you push a button shown in fig.
Example: After finishing the data transmitting, we can attach the logical analyzer to RTS and its
signal to find out if software lowers RTS at right time or early or late. We can also attach the logical
analyzer, to ENABLE/ CLK and DATA signals to EEPROM to find if it works correctly or not.
Figure shows a typical logic analyzer. They have display screens similar to those of oscilloscopes.
Most logic analyzers present menus on the screen and give you a keyboard to enter choices, some
may have mouse as well as network connections to control from work stations. Logical analyzers
include hard disks and diskettes. It can be attached to many signals through ribbons. Since logic
F IGURE 4.18: Logic analyzer timing display: Button and Alarm signal
analyzer can attach to many signals simultaneously, one or more ribbon cables typically attach to
the analyzer.
Logical Analyzer in State Mode:
In the timing mode, logical analyzer is self clocked. That is, it captures data without reference
to any events on the circuit. In state mode, they capture data when some particular event occur,
called a clock occurs in the system. In this mode the logical analyzer see what instructions the
microprocessor fetched and what data it read from and write to its memory and I/O devices. To
see what instructions the microprocessor fetched, you connect logical analyzer probes to address
and data signals of the system and RE signal on the ROM. Whenever RE signal raise then logical
analyzer capture the address and data signals. The captured data is called as trace. The data is
valid when RE signal raise. State mode analyzers present a text display as state of signals in row
as shown in the below figure. The logical analyzer in state mode extremely useful for the software
F IGURE 4.19: Logic Analyzer timing Display: Data and RTS signal
engineer,
1. Trigger the logical analyzer, if processor never fetch if there is no memory.
2. Trigger the logical analyzer, if processor writes an invalid value to a particular address in RAM.
3. Trigger the logical analyzer, if processor fetches the first instruction of ISR and executed.
4. If we have bug that rarely happens, leave processor and analyzer running overnight and check
results in the morning.
5. There is filter to limit what is captured.
Logical analyzers have short comings:
Even though analyzers tell what processor did, we cannot stop, break the processor, even if it did
wrong. By the analyzer the processors registers are invisible only we know the contents of memory
in which the processors can read or write. If program crashes, we cannot examine anything in the
system. We cannot find if the processor executes out of cache. Even if the program crashes, still
emulator let make us see the contents of memory and registers. Most emulators capture the trace
F IGURE 4.20: Logic analyzer
F IGURE 4.21: Typical logic analyzer state mode display
like analyzers in the state mode. Many emulators have a feature called overlay memory, one or
more blocks of memory inside the emulator, emulated microprocessor can use instead of target
machine.
4.9 In circuit emulators:
In-circuit emulators also called as emulator or ICE replaces the processor in target system.
Ice appears as processor and connects all the signals and drives. It can perform debugging, set
break points after break point is hit we can examine the contents of memory, registers, see the
source code, resume the execution. Emulators are extremely useful, it is having the power of de-
bugging, acts as logical analyzer. Advantages of logical analyzers over emulators:
• Logical analyzers will have better trace filters, more sophisticated triggering mechanisms.
• Logic analyzers will also run in timing mode.
• Logic analyzers will work with any microprocessor.
• With the logic analyzers you can hook up as many as or few connections as you like. With the
emulator you must connect all of the signal.
• Emulators are more invasive than logic analyzers.
4.9.1 Software only Monitors:
One widely available debugging tool often called as Monitor .monitors allow you to run software
on the actual target, giving the debugging interface to that of In circuit emulator.
Monitors typically work as follows:
• One part of the monitor is a small program resides in ROM on the target, this knows how to
receive software on serial port, across network, copy into the RAM and run on it. Other names for
monitor are target agent, monitor, debugging kernel and so on.
• Another part the monitor run on host side, communicates with debugging kernel, provides de-
bugging interface through serial port communication network.
• You write your modules and compile or assemble them.
• The program on the host cooperates with debugging kernel to download compiled module into
the target system RAM. Instruct the monitor to set break points, run the system and so on.
• You can then instruct the monitor to set breakpoints. See the above figure, Monitors are extraor-
dinarily valuable, gives debugging interface without any modifications.
4.9.2 Disadvantages of Monitors:
• The target hardware must have communication port to communicate the debugging kernel with
host program. We need to write the communication hardware driver to get the monitor working.
• At some point we have to remove the debugging kernel from your target system and try to run
the software without it.
F IGURE 4.22: software only the monitor
• Most of the monitors are incapable of capturing the traces like of those logic analyzers and
emulators.
• Once a breakpoint is hit, stop the execution can disrupt the real time operations so badly.
Other Monitors:
The other two mechanisms are used to construct the monitors, but they differ with normal monitor
in how they interact with the target. The first target interface is with through a ROM emulator.
This will do the downing programs at target side, allows the host program to set break points and
other various debugging techniques.
Chapter 5
INTRODUCTION TO ADVANCED
PROCESSORS
Course Outcomes
CO 5 Illustrate the architecture, memory organization and instruction level Understand
parallelism of ARM and SHARC processors used in Embedded Sys-
tems.
CO 6 Interpret the concepts of Internet of Things used in the embedded sys- Understand
tems applications.
5.1 INTRODUCTION TO ADVANCED ARCHITECTURES:
Two Computing architectures are available

1. Von Neumann architecture computer
2. Harvard architecture
199
5.1.1 Von Neumann architecture computer:
• The memory holds both data and instructions, and can be read or written when given an address.
A computer whose memory holds both data and instructions is known as a von Neumann machine
• The CPU has several internal registers that store values used internally. One of those registers is
the program counter (PC) ,which holds the address in memory of an instruction.
• The CPU fetches the instruction from memory, decodes the instruction, and executes it.
• The program counter does not directly determine what the machine does next, but only indirectly
by pointing to an instruction in memory.
F IGURE 5.1: Von Neumann architecture computer
5.1.2 Harvard architecture:
• Harvard machine has separate memories for data and program.

• The program counter points to program memory, not data memory.
• As a result, it is harder to write self-modifying programs (programs that write data values, then
use
Those values as instructions on Harvard machines.
Advantage:
• The separation of program and data memories provides higher performance for digital signal
processing.
F IGURE 5.2: Harvard architecture
Differences between Von neumann and harvard architecture:
F IGURE 5.3: Differences between Von neumann and harvard architecture

RISC and CISC Processors:
F IGURE 5.4: Difference between RISC and CISC Processors
5.2 ARM(Advanced RISC Machine) Processor:
• ARM uses RISC architecture

• ARM uses assembly language for writing programs
• ARM instructions are written one per line, starting after the first column.
• Comments begin with a semicolon and continue to the end of the line.
• A label, which gives a name to a memory location, comes at the beginning of the line, starting in
the first column.
Here is an example: LDR r0,[r8]; a comment
label ADD r4,r0,r1
5.2.1 Memory Organization in ARM Processor:
The ARM architecture supports two basic types of data:

• The standard ARM word is 32 bits long.
• The word may be divided into four 8-bit byte
• ARM allows addresses up to 32 bits long
• The ARM processor can be configured at power-up to address the bytes in a word in either little-
endian mode (with the lowest-order byte residing in the low-order bits of the word) or big-endian
mode
F IGURE 5.5: Byte organization within word
5.2.2 Data Operations in ARM:
• In the ARM processor, arithmetic and logical operations cannot be performed directly on memory
locations.
• ARM is a load-store architecture—data operands must first be loaded into the CPU and then
stored back to main memory to save the results
ARM Programming Model:
1. Programming model gives information about various registers supported by ARM.

2. ARM has 16 general-purpose registers, r0 to r15.
3. Except for r15, they are identical—any operation that can be done on one of them can be done
on the other one also
4. r15 register is also used as program counter(PC)
5. current program status register (CPSR)
• This register is set automatically during every arithmetic, logical, or shifting operation.
• The top four bits of the CPSR hold the following useful information about the results of that
arithmetic/logical operation:
• The negative (N) bit is set when the result is negative in two’s-complement arithmetic.
• The zero (Z) bit is set when every bit of the result is zero.
• The carry (C) bit is set when there is a carry out of the operation.
• The overflow (V ) bit is set when an arithmetic operation results in an overflow.
F IGURE 5.6: Basic ARM programming model

5.3 Types of Instructions supported by ARM Processor:
1. Arithmetic Instructions
2. Logical Instructions
3. shift / rotate Instructions
4. Comparison Instructions
5. move instructions
F IGURE 5.7: Arithmetic Instructions
F IGURE 5.8: MOV Instructions
Instructions examples:
ADD r0,r1,r2
This instruction sets register r0 to the sum of the values stored in r1 and r2.
F IGURE 5.9: Load store Instructions
ADD r0,r1,#2 (immediate operand are allowed during addition)

RSB r0, r1, r2 sets r0 to be r2-r1.
bit clear: BIC r0, r1, r2 sets r0 to r1 and not r2.
Multiplication:
no immediate operand is allowed in multiplication

two source operands must be different registers
MLA: The MLA instruction performs a multiply-accumulate operation, particularly useful in ma-
trix operations and signal processing
MLA r0,r1,r2,r3 sets r0 to the value r1x r2+r3.
Shift operations:
Logical shift(LSL, LSR)
Arithmetic shifts (ASL, ASR)
• A left shift moves bits up toward the most-significant bits,
• right shift moves bits down to the least-significant bit in the word.
• The LSL and LSR modifiers perform left and right logical shifts, filling the least-significant bits
of the operand with zeroes.
• The arithmetic shift left is equivalent to an LSL, but the ASR copies the sign bit—if the sign is
0, a 0 is copied, while if the sign is 1, a 1 is copied. Rotate operations: (ROR, RRX)
• The rotate modifiers always rotate right, moving the bits that fall off the least-significant bit up
to the most-significant bit in the word.
• The RRX modifier performs a 33-bit rotate, with the CPSR’s C bit being inserted above the sign
bit of the word; this allows the carry bit to be included in the rotation
Compare instructions: (CMP, CMN)
• compare instruction modifies flags values (Negative flag, zero flag, carry flag, Overflow flag)
• CMP r0, r1 computes r0 – r1, sets the status bits, and throws away the result of the subtraction.
• CMN uses an addition to set the status bits.
• TST performs a bit-wise AND on the operands,
• while TEQ performs an exclusive-or
Load store instructions:
• ARM uses register-indirect addressing

• The value stored in the register is used as the address to be fetched from memory; the result of
that fetch is the desired operand value.
• LDR r0,[r1] sets r0 to the value of memory location 0x100.
• Similarly, STR r0,[r1] would store the contents of r0 in the memory location whose address is
given in r1
LDR r0,[r1, – r2]
5.4 ARM Register indirect addressing:
LDR r0,[r1, #4] loads r0 from the address r1+ 4.
ARM Base plus offset addressing mode:
the register value is added to another value to form the address. For instance, LDR r0,[r1,#16]
loads r0 with the value stored at location r1+16.( r1-base address, 16 is offset) Auto-indexing
F IGURE 5.10: Register Indirect ddressing
F IGURE 5.11: offset addressing mode
updates the base register, such that LDR r0,[r1,#16]! —-first adds 16 to the value of r1, and then
uses that new value as the address. The ! operator causes the base register to be updated with the
computed address so that it can be used again later.
Post-indexing does not perform the offset calculation until after the fetch has been performed.
Consequently,
LDR r0,[r1],#16 will load r0 with the value stored at the memory location whose address is given
by r1, and then add 16 to r1 and set r1 to the new value.
FLOW OF CONTROL INSTRUCTIONS (Branch Instructions):
Branch Instructions
1. conditional instructions(BGE– B is branch, GE is condition)
2. unconditional instructions(B)
the following branch instruction B #100 will add 400 to the current PC value example for flow of
F IGURE 5.12: Condition codes in ARM
control programs: Branch and Link instruction (BL) for implementing functions or sub routines
or procedures:
5.5 SHARC Processor:
Features of SHARC processor:

1. SHARC stands for Super Harvard Architecture Computer
2. The ADSP-21060 SHARC chip is made by Analog Devices, Inc.
3. It is a 32-bit signal processor made mainly for sound, speech, graphics, and imaging applica-
tions.
4. It is a high-end digital signal processor designed with RISC techniques.
5. Number formats:
i. 32-bit Fixed Format
Fractional/Integer
Unsigned/Signed
ii. Floating Point
32-bit single-precision IEEE floating-point data format
40-bit version of the IEEE floating-point data format.
16-bit shortened version of the IEEE floating-point data format.
F IGURE 5.13: example for flow of control programs
6. 32 Bit floating point, with 40 bit extended floating point capabilities.

7. Large on-chip memory.
8. Ideal for scalable multi-processing applications.
9. Program memory can store data.
10. Able to simultaneously read or write data at one location and get instructions from another
place in memory.
11. 2 buses
Data memory bus.
Program bus.
12. Either two separate memories or a single dual-port memory
13. The SHARC incorporates features aimed at optimizing such loops.
F IGURE 5.14: Procedure call in ARM
14. High-Speed Floating Point Capability

15. Extended Floating Point
16. The SHARC supports floating, extended-floating and non-floating point.
17. No additional clock cycles for floating point computations.
18. Data automatically truncated and zero padded when moved between 32-bit memory and inter-
nal registers.
5.6 SHARC PROCESSOR PROGRAMMING MODEL:
Programming model gives the registers details. The following registers are used in SHARC pro-
cessors for various purposes:
• Register files: R0-R15 (aliased as F0-F15 for floating point)
• Status registers.
• Loop registers.
• Data address generator registers(DAG1 and DAG2)
• Interrupt registers.
• 16 primary registers (R0-R15)

• 16 alternate registers (F0-F15)
• each register can hold 40 bits
• R0 – R15 are for Fixed-Point Numbers
• F0 – F15 are for Floating-Point Numbers
F IGURE 5.15: SHARC Programming model
5.6.1 Status registers:
ASTAT: arithmetic status.

STKY: sticky.
MODE 1: mode 1.
• The STKY register is a sticky version of ASTAT register, the STKY bits are set along with
ASTAT register bits but not cleared until cleared by an instruction.
• The SHARC perform saturation arithmetic on fixed point values, saturation mode is controlled
by ALUSAT bit in MODE1 register.
• All ALU operations set AZ (zero), AN (negative), AV (overflow), AC (fixed-point carry), AI
(floating-point invalid) bits in ASTAT.
Data Address Generators(DAG)
F IGURE 5.16: Data Address Generators
F IGURE 5.17: DAG 1 Register

F IGURE 5.18: DAG 2 Register
5.6.2 Multifunction computations or instruction level parallel processing:
Can issue some computations in parallel:

• dual add-subtract;
• fixed-point multiply/accumulate and add, subtract, average
• floating-point multiply and ALU operation
• multiplication and dual add/subtract
5.7 Pipelining in SHARC processor:
Instructions are processed in three cycles:

Fetch instruction from memory
Decode the opcode and operand
Execute the instruction
• SHARC supports delayed and non-delayed branches
• Specified by bit in branch instruction
• 2 instruction branch delay slot
• Six Nested Levels of Looping in Hardware
Bus Architecture:
Twin Bus Architecture:

1 bus for Fetching Instructions
1 bus for Fetching Data
Improves multiprocessing by allowing more steps to occur during each clock
5.7.1 Addressing modes provided by DAG in SHARC Processor:
1. The Simplest addressing mode

2. Absolute address
3. post modify with update mode
4. base-plus-offset mode
5. Circular Buffers
6. Bit reversal addressing mode
1. The Simplest addressing mode provides an immediate value that can represent the address.
Example : R0=DM(0X200000)
R0=DM( a) i.e load R0 with the contents of the variable a
2. An Absolute address has entire address in the instruction, space inefficient, address occupies
the more space.
3. A post modify with update mode allows the program to sweep through a range of address. This
uses I register and modifier, I registers shows the address value and modifier (M register value or
Immediate value) is update the value.
For load
R0=DM(I3,M1)
For store : DM(I3,M1)=R0
4. The base-plus-offset mode here the address computed as I+M where I is the base and M modi-
fier or offset.
Example: R0=DM(M1, I0)
I0=0x2000000 and M0= 4 then the value for R0 is loaded from 0x2000004
5. Circular Buffers is an array of n elements is n+1th element is referenced then the location is 0.
It is wrapping around from end to beginning of the buffer.
This mode uses L and B registers, L registers is set with +ve and nonzero value at staring point, B
register is stored with same value as the I register is store with base address.
If I register is used in post modify mode, the incremental value is compared to the sum of L and B
registers, if end of the buffer is reached then I register is wrapped around.
6. Bit reversal addressing mode : this is used in Fast Fourier Transform (FFT ). Bit reversal can be
performed only in I0 and I8 and controlled by BR0 and BR8 bits in the MODE1 register.
SHARC allows two fetches per cycle.
F0=DM(M0,I0); FROM DATA MEMORY
F1=PM(M8,I8); FROM PROGRAM MEMORY
5.7.2 BASIC addressing:
Immediate value:
R0 = DM(0x20000000);
Direct load:
R0 = DM( a); ! Loads contents of a
Direct store:
DM( a)= R0; ! Stores R0 at a
SHARC programs examples:
expression:
x = (a + b) - c;
program:
R0 = DM( a) Load a
R1 = DM( b); Load b
R3 = R0 + R1;
R2 = DM( c); Load c

R3 = R3-R2;
DM( x) = R3; Store result in x
expression :
y = a*(b+c);
program:
R1 = DM( b) Load b
R2 = DM( c); Load c
R2 = R1 + R2;
R0 = DM( a); Load a
R2 = R2*R0;
DM( y) = R23; Store result in y
note: for programs , refer class notes
SHARC jump:
Unconditional flow of control change:

JUMP foo
Three addressing modes:
direct;
indirect;
PC-relative.
5.7.3 ARM vs. SHARC
• ARM7 is von Neumann architecture • ARM9 is Harvard architecture • SHARC is modified

Harvard architecture. – On chip memory (¿ 1Gbit) evenly split between program memory (PM)
and data memory (DM) – Program memory can be used to store some data. – Allows data to be
fetched from both memory in parallel
5.7.4 The SHARC ALU operations:
1. Fixed point ALU operations

2. Floating point ALU operations
3. Shifter operations in SHARC
F IGURE 5.19: Fixed point ALU operations

F IGURE 5.20: Floating point ALU operations
F IGURE 5.21: Shifter operations in SHARC
BUS PROTOCOLS:
For serial data communication between different peripherals components , the following standards
are used :
• VME
• PCI
• ISA etc
For distributing embedded applications, the following interconnection network protocols are there:
• I2 C
• CAN etc
5.8 I2 C :
• The I2 C bus is a well-known bus commonly used to link microcontrollers into systems
• I2 C is designed to be low cost, easy to implement, and of moderate speed up to 100 KB/s for the
standard bus and up to 400 KB/s for the extended bus
• it uses only two lines: the serial data line (SDL) for data and the serial clock line (SCL), which
indicates when valid data are on the data line
F IGURE 5.22: I2 C bus system
The basic electrical interface of I2 C to the bus is shown in Figure
F IGURE 5.23: Electrical Interface to I2 C bus

• A pull-up resistor keeps the default state of the signal high, and transistors are used in each bus
device to pull down the signal when a 0 is to be transmitted.
• Open collector/open drain signaling allows several devices to simultaneously write the bus with-
out causing electrical damage.
• The open collector/open drain circuitry allows a slave device to stretch a clock signal during a
read from a slave.
• The master is responsible for generating the SCL clock, but the slave can stretch the low period
of the clock
• The I2 C bus is designed as a multimaster bus—any one of several different devices may act as
the master at various times.
• As a result, there is no global master to generate the clock signal on SCL. Instead, a master drives
both SCL and SDL when it is sending data. When the bus is idle, both SCL and SDL remain high.
• When two devices try to drive either SCL or SDL to different values, the open collector/ open
drain circuitry prevents errors
Address of devices:
• A device address is 7 bits in the standard I2 C definition (the extended I2 C allows 10-bit ad-
dresses).
• The address 0000000 is used to signal a general call or bus broadcast, which can be used to signal
all devices simultaneously. A bus transaction comprised a series of 1-byte transmissions and an
address followed by one or more data bytes.
data-push programming :
• I2 C encourages a data-push programming style. When a master wants to write a slave, it transmits
the slave’s address followed by the data.
• Since a slave cannot initiate a transfer, the master must send a read request with the slave’s
address and let the slave transmit the data.
• Therefore, an address transmission includes the 7-bit address and 1 bit for data direction: 0 for
writing from the master to the slave and 1 for reading from the slave to the master.
F IGURE 5.24: Format of I2 C adress transmission
5.8.1 Bus transaction or transmission process:
1) start signal (SCL high and sending 1 to 0 in SDL)

2) followed by device address of 7 bits
3) RW(read / write bit) set to either 0 or 1
4) after address, now the data will be sent
5) after transmitting the complete data, the transmission stops.
The below figure is showing write and read bus transaction:
F IGURE 5.25: Typical bus transmission
State transition graph:
F IGURE 5.26: State transition of I2 C bus master

5.8.2 Transmitting byte in I2 C Bus (Timing Diagram):
F IGURE 5.27: Transmitting byte in I2 C Bus
1. initially, SCL will be high, SDL will be low.

2. data byte will be transmitted.
3. after transmitting every 8 bits, an Acknowledgement will come
4. then stop signal is issued by setting both SCL and SDL high.
5.8.3 I2 C interface on a microcontroller:
F IGURE 5.28: I2 C interface on a microcontroller
5.9 Controlled Area Network:
The CAN bus was designed for automotive electronics and was first used in production cars in
1991.
The CAN bus uses bit-serial transmission. CAN runs at rates of 1 MB/s over a twisted pair con-
nection of 40 m.
An optical link can also be used. The bus protocol supports multiple masters on the bus. The
F IGURE 5.29: Physical and electrical organization of CAN bus
above figure shows CAN electrical interface:

• each node in the CAN bus has its own electrical drivers and receivers that connect the node to
the bus in wired-AND fashion.
• In CAN terminology, a logical 1 on the bus is called recessive and a logical 0 is dominant.
• The driving circuits on the bus cause the bus to be pulled down to 0 if any node on the bus pulls
the bus down (making 0 dominant over 1).
• When all nodes are transmitting 1s, the bus is said to be in the recessive state; when a node
transmits a 0, the bus is in the dominant state. Data are sent on the network in packets known as
data frames.
5.9.1 CAN DATA FRAME:
Explanation for data frame :

• A data frame starts with a 1 and ends with a string of seven zeroes. (There are at least three bit
fields between data frames.)
F IGURE 5.30: CAN DATA FRAME
• The first field in the packet contains the packet’s destination address and is known as the arbitra-
tion field. The destination identifier is 11 bits long.
• The trailing remote transmission request (RTR) bit is set to 0 if the data frame is used to request
data from the device specified by the identifier.
• When RTR 1, the packet is used to write data to the destination identifier.
• The control field provides an identifier extension and a 4-bit length for the data field with a 1 in
between. The data field is from 0 to 64 bytes, depending on the value given in the control field.
• A cyclic redundancy check (CRC) is sent after the data field for error detection.
• The acknowledge field is used to let the identifier signal whether the frame was correctly re-
ceived: The sender puts a recessive bit (1) in the ACK slot of the acknowledge field; if the receiver
detected an error, it forces the value to a dominant (0) value.
• If the sender sees a 0 on the bus in the ACK slot, it knows that it must retransmit. The ACK slot
is followed by a single bit delimiter followed by the end-of-frame field.
5.9.2 Architecture of CAN controller:
• The controller implements the physical and data link layers;

• since CAN is a bus, it does not need network layer services to establish end-to-end connections. •
The protocol control block is responsible for determining when to send messages, when a message
must be resent due to arbitration losses, and when a message should be received.
F IGURE 5.31: Architecture of CAN controller
5.10 INTERNET ENABLED SYSTEMS:
5.10.1 IP Protocol:
• The Internet Protocol (IP) is the fundamental protocol on the Internet.

• It provides connectionless, packet-based communication.
• it is an internetworking standard.
• an Internet packet will travel over several different networks from source to destination.
• The IP allows data to flow seamlessly through these networks from one end user to another •
Figure explanation:
• IP works at the network layer.
• When node A wants to send data to node B, the application’s data pass through several layers of
the protocol stack to send to the IP.
• IP creates packets for routing to the destination, which are then sent to the data link and physical
layers.
• A node that transmits data among different types of networks is known as a router.
F IGURE 5.32: Protocal utilization in internet communication
5.10.2 IP Packet Format:
• The header and data payload are both of variable length.

• The maximum total length of the header and data payload is 65,535 bytes.
• An Internet address is a number (32 bits in early versions of IP, 128 bits in IPv6). The IP address
is typically written in the form xxx.xx.xx.xx.
• packets that do arrive may come out of order. This is referred to as best-effort routing. Since
routes for data may change quickly with subsequent packets being routed along very different paths
with different delays, real-time performance of IP can be hard to predict. Relationships between
F IGURE 5.33: IP Packet structure
IP and higher-level Internet services: Using IP as the foundation, TCP is used to provide File
Transport Protocol for batch file transfers, Hypertext Transport Protocol (HTTP) for World Wide
Web service, Simple Mail Transfer Protocol for email, and Telnet for virtual terminals. A separate
transport protocol, User Datagram Protocol, is used as the basis for the network management
services provided by the Simple Network Management Protocol
F IGURE 5.34: Internet service stack

CSE_ESD_LN_UG24

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

CSE_ESD_LN_UG24

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE_ESD_LN_UG24

Uploaded by

Copyright:

Available Formats

1

INSTITUTE OF AERONAUTICAL ENGINEERING

D EPARTMENT O F COMPUTER SCIENCE AND ENGINEERING

2 INTRODUCTION TO EMBEDDED C AND APPLICATIONS 44

2.2 REGISTER ALLOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3 RTOS BASED EMBEDDED SYSTEM DESIGN 101

3.6 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4 EMBEDDED SOFTWARE DEVELOPMENT TOOLS 170

5 INTRODUCTION TO ADVANCED PROCESSORS 199

5.5 SHARC Processor: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

1.1 General block diagram of a digital camera . . . . . . . . . . . . . . . . . . . . . 12

2.1 C compiler register usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.2 ATPCS Argument passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1 Architecture of Operating System . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.1 Difference between host and target system . . . . . . . . . . . . . . . . . . . . . 171

5.1 Von Neumann architecture computer . . . . . . . . . . . . . . . . . . . . . . . . 200

1.1 Definition of Embedded System

Embedded System Definition:

1.1.1 The components of embedded system hardware:

A microcontroller (sometimes abbreviated µC, µC or MCU) is a small computer on a single inte-

DIGITAL SIGNAL PROCESSOR:

EMBEDDED COMPUTING SYSTEM:

An embedded system is a special-purpose system in which the computer is completely encapsu-

1.2 Embedded Systems Vs. General Computing Systems:

General Purpose Computing System Embedded Systems

1.3 History of Embedded Systems

1.4 Classification of Embedded Systems

2. Complexity and performance requirements.

3. Based on deterministic behaviour.

1.4.1 Classification Based on Generation

1.4.1.1 First Generation:

1.4.1.2 Second Generation:

1.4.1.3 Third Generation:

1.4.1.4 Fourth Generation:

1.4.1.5 What Next?

1.4.2 Classification Based on Complexity and Performance

1.4.2.1 Small-Scale Embedded Systems

1.4.2.2 Medium-Scale Embedded Systems

1.4.2.3 Large-Scale Embedded Systems/Complex Systems

1.4.3 Classification Based on deterministic behavior

1.4.4 Classification Based on triggering

1.5 Complex Systems and Microprocessors

Examples of Embedded Systems:

1.5.1 Digital camera:

Fig. 1.1 shows a general block diagram of a digital camera.

F IGURE 1.1: General block diagram of a digital camera

2. An image is made up of picture elements (pixels) of different luminance and chrominance.

7. A standard computer interface provides a mechanism for transferring images to a computer

1.6 Major Application Areas of Embedded Systems

We can find applications of embedded systems in following sectors:

1.7 The Embedded System Design Process

The embedded system design process aimed at two objectives.

F IGURE 1.2: Major levels of abstraction in the design process

We need to consider the major goals of the design:

• Manufacturing cost includes the cost of components and assembly.

TABLE 1.2: Sample requirements form

F IGURE 1.3: GPS moving map

F IGURE 1.4: Requirements form for GPS moving map system