Microcontroller and Embedded Systems 21cs43 Mes Vtu Notes 2021
Microcontroller and Embedded Systems 21cs43 Mes Vtu Notes 2021
21CS43
ARM PROGRAMMING USING ASSEMBLY LANGUAGE
WRITING ASSEMBLY CODE:
This section gives examples showing how to write basic assembly code. Also, this section uses the ARM
macro assembler armasm for examples.
Example 1:
Let’s see how to replace square by an assembly
This example shows how to convert a C
function that performs the same action. Remove the C
function to an assembly function—usually the
definition of square, but not the declaration (the
first stage of assembly optimization. Consider
second line) to produce a new C file main1.c. Next add
the simple C program main.c following that
an armasm assembler file square.s with the following
prints the squares of the integers from 0 to 9:
contents:
The AREA directive names the area or code section that the code lives in. If you use non-
alphanumeric characters in a symbol or area name, then enclose the name in vertical bars. Many
non-alphanumeric characters have special meanings otherwise. In the previous code we define a
read-only code area called .text.
The EXPORT directive makes the symbol square available for external linking. At line six we
define the symbol square as a code label. Note that armasm treats non-indented text as a label
definition.
When square is called, the parameter passing is defined by the ARM-Thumb procedure call
standard (ATPCS). The input argument is passed in register r0, and the return value is returned in
register r0. The multiply instruction has a restriction that the destination register must not be the
same as the first argument register. Therefore we place the multiply result into r1 and move this
to r0.
The END directive marks the end of the assembly file. Comments follow a semicolon.
30
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
The following script illustrates how to build this example using command line tools.
Example 1 only works if you are compiling your C as ARM code. If you compile your C as Thumb code,
then the assembly routine must return using a BX instruction.
Example 2: When calling ARM code from C compiled as Thumb, the only change required to the
assembly in Example 1 is to change the return instruction to a BX. BX will return to ARM or Thumb state
according to bit 0 of lr. Therefore this routine can be called from ARM or Thumb. Use BX lr instead of
MOV pc, lr whenever your processor supports BX (ARMv4T and above). Create a new assembly file
square2.s as follows:
With this example we build the C file using the Thumb C compiler tcc. We assemble the assembly file
with the interworking flag enabled so that the linker will allow the Thumb C code to call the ARM
assembly code. You can use the following commands to build this example:
Example 3: This example shows how to call a subroutine from an assembly routine. We will take
Example 1 and convert the whole program (including main) into assembly. We will call the C library
routine printf as a subroutine. Create a new assembly file main3.s with the following contents:
31
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
The IMPORT directive is used to declare symbols that are defined in other files.
The imported symbol Lib$$Request$$armlib makes a request that the linker links with the
standard ARM C library.
o The WEAK specifier prevents the linker from giving an error if the symbol is not found at
link time. If the symbol is not found, it will take the value zero.
The second imported symbol main is the start of the C library initialization code.
You only need to import these symbols if you are defining your own main; a main defined in C code will
import these automatically for you. Importing printf allows us to call that C library function.
The RN directive allows us to use names for registers. In this case we define i as an alternate
name for register r4.
32
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
o Using register names makes the code more readable. It is also easier to change the
allocation of variables to registers at a later date. Recall that ATPCS states that a function
must preserve registers r4 to r11 and sp. We corrupt i (r4), and calling printf will corrupt
lr. Therefore we stack these two registers at the start of the function using an STMFD
instruction. The LDMFD instruction pulls these registers from the stack and returns by
writing the return address to pc.
The DCB directive defines byte data described as a string or a comma-separated list of bytes.
To build this example you can use the following command line script:
Note that Example 3 also assumes that the code is called from ARM code. If the code can be called from
Thumb code as in Example 2 then we must be capable of returning to Thumb code. For architectures
before ARMv5 we must use a BX to return. Change the last instruction to the two instructions:
Example 4: This example defines a function sumof that can sum any number of integers. The arguments
are the number of integers to sum followed by a list of the integers. The sumof function is written in
assembly and can accept any number of arguments. Put the C part of the example in a file main4.c:
33
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
The code keeps count of the number of remaining values to sum, N. The first three values are in registers
r1, r2, r3. The remaining values are on the stack (Recall that ATPCS places the first four arguments in
registers r0 to r3. Subsequent arguments are placed on the stack). You can build this example using the
commands –
34
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
A cycle counter measures the number of cycles taken by a specific routine. You can measure
your success by using a cycle counter to benchmark a given subroutine before and after an
optimization.
The ARM simulator used by the ADS1.1 debugger is called the ARMulator and provides profiling
and cycle counting features.
o The ARMulator profiler works by sampling the program counter pc at regular intervals.
The profiler identifies the function the pc points to and updates a hit counter for each
function it encounters. Another approach is to use the trace output of a simulator as a
source for analysis.
o The accuracy of a pc-sampled profiler is limited, as it can produce meaningless results if
it records too few samples.
ARM implementations do not normally contain cycle-counting hardware; so to easily measure
cycle counts you should use an ARM debugger with ARM simulator.
o You can configure the ARMulator to simulate a range of different ARM cores and obtain
cycle count benchmarks for a number of platforms.
INSTRUCTION SCHEDULING:
The time taken to execute instructions depends on the implementation pipeline. For this section, we
assume ARM9TDMI pipeline timings. The following rules summarize the cycle timings for common
instruction classes on the ARM9TDMI.
Instructions that are conditional on the value of the ARM condition codes in the cpsr take one cycle if the
condition is not met. If the condition is met, then the following rules apply:
ALU operations such as addition, subtraction, and logical operations take one cycle.
This includes a shift by an immediate value. If you use a register-specified shift, then add one
cycle. If the instruction writes to the pc, then add two cycles.
Load instructions that load N 32-bit words of memory such as LDR and LDM take N cycles to
issue, but the result of the last word loaded is not available on the following cycle.
o The updated load address is available on the next cycle. This assumes zero-wait-state
memory for an un-cached system, or a cache hit for a cached system. An LDM of a single
value is exceptional, taking two cycles. If the instruction loads pc, then add two cycles.
o Load instructions that load 16-bit or 8-bit data such as LDRB, LDRSB, LDRH, and
LDRSH take one cycle to issue. The load result is not available on the following two
cycles. The updated load address is available on the next cycle. This assumes zero-wait-
state memory for an un-cached system, or a cache hit for a cached system.
Branch instructions take three cycles.
35
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
Store instructions that store N values take N cycles. This assumes zero-wait-state memory for an
un-cached system, or a cache hit or a write buffer with N free entries for a cached system. An
STM of a single value is exceptional, taking two cycles.
Multiply instructions take a varying number of cycles depending on the value of the second
operand in the product.
To understand how to schedule code efficiently on the ARM, we need to understand the ARM pipeline
and dependencies. The ARM9TDMI processor performs five operations in parallel:
Fetch: Fetch from memory the instruction at address pc. The instruction is loaded into the core
and then processes down the core pipeline.
Decode: Decode the instruction that was fetched in the previous cycle. The processor also reads
the input operands from the register bank if they are not available via one of the forwarding paths.
ALU: Executes the instruction that was decoded in the previous cycle. Note this instruction was
originally fetched from address pc − 8 (ARM state) or pc − 4 (Thumb state).
o Normally this involves calculating the answer for a data processing operation, or the
address for a load, store, or branch operation.
o Some instructions may spend several cycles in this stage. For example, multiply and
register-controlled shift operations take several ALU cycles.
LS1: Load or store the data specified by a load or store instruction. If the instruction is not a load
or store, then this stage has no effect.
LS2: Extract and zero- or sign-extend the data loaded by a byte or half-word load instruction. If
the instruction is not a load of an 8-bit byte or 16-bit half-word item, then this stage has no effect.
The following Figure shows a simplified functional view of the five-stage ARM9TDMI pipeline.
Note that multiply and register shift operations are not shown in the figure.
After an instruction has completed the five stages of the pipeline, the core writes the result to the register
file. Note that pc points to the address of the instruction being fetched. The ALU is executing the
instruction that was originally fetched from address pc − 8 in parallel with fetching the instruction at
address pc.
36
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
How does the pipeline affect the timing of instructions? Consider the following examples. These
examples show how the cycle timings change because an earlier instruction must complete a stage before
the current instruction can progress down the pipeline.
If an instruction requires the result of a previous instruction that is not available, then the processor stalls.
This is called a pipeline hazard or pipeline interlock.
This instruction pair takes two cycles. The ALU calculates r0 + r1 in one cycle. Therefore this result is
available for the ALU to calculate r0 + r2 in the second cycle.
This instruction pair takes three cycles. The ALU calculates the address r2 + 4 in the first cycle while
decoding the ADD instruction in parallel. However, the ADD cannot proceed on the second cycle because
the load instruction has not yet loaded the value of r1. Therefore the pipeline stalls for one cycle while the
load instruction completes the LS1 stage. Now that r1 is ready, the processor executes the ADD in the
ALU on the third cycle.
The following Figure illustrates how this interlock affects the pipeline.
The processor stalls the ADD instruction for one cycle in the ALU stage of the pipeline while the load
instruction completes the LS1 stage. Figure denotes this stall by italic ADD. Since the LDR instruction
proceeds down the pipeline, but the ADD instruction is stalled, a gap opens up between them. This gap is
sometimes called a pipeline bubble. We’ve marked the bubble with a dash.
Example 7: This example shows a one-cycle interlock caused by delayed load use.
37
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
This instruction triplet takes four cycles. Although the ADD proceeds on the cycle following the load
byte, the EOR instruction cannot start on the third cycle. The r1 value is not ready until the load
instruction completes the LS2 stage of the pipeline. The processor stalls the EOR instruction for one cycle.
Note that the ADD instruction does not affect the timing at all. The sequence takes four cycles whether it
is there or not! The following Figure shows how this sequence progresses through the processor pipeline.
The ADD doesn’t cause any stalls since the ADD does not use r1, the result of the load.
Example 8: This example shows why a branch instruction takes three cycles. The processor must flush
the pipeline when jumping to a new address.
The three executed instructions take a total of five cycles. The MOV instruction executes on the first
cycle. On the second cycle, the branch instruction calculates the destination address. This causes the core
to flush the pipeline and refill it using this new pc value. The refill takes two cycles. Finally, the SUB
instruction executes normally. The following Figure illustrates the pipeline state on each cycle. The
pipeline drops the two instructions following the branch when the branch takes place.
38
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
Scheduling of Load Instructions:
Load instructions occur frequently in compiled code, accounting for approximately one-third of all
instructions. Careful scheduling of load instructions so that pipeline stalls don’t occur can improve
performance. The compiler attempts to schedule the code as best it can, but the aliasing problem of C
limits the available optimizations. The compiler cannot move a load instruction before a store instruction
unless it is certain that the two pointers used do not point to the same address.
Consider an example of a memory-intensive task. The following function, str_tolower, copies a zero-
terminated string of characters from in to out. It converts the string to lowercase in the process.
The compiler generates the above compiled output. Notice that the compiler optimizes the condition (c
>= ‘A’ && c <= ‘Z’) to the check that 0 <= c-‘A’ <= ‘Z’-‘A’. The compiler can perform this check
using a single unsigned comparison.
Unfortunately, the SUB instruction uses the value of c directly after the LDRB instruction that loads c.
Consequently, the ARM9TDMI pipeline will stall for two cycles. The compiler can’t do any better since
everything following the load of c depends on its value.
However, there are two ways you can alter the structure of the algorithm to avoid the cycles by using
assembly. We call these methods load scheduling by preloading and unrolling.
REGISTER ALLOCATION:
You can use 14 of the 16 visible ARM registers to hold general-purpose data. The other two registers are
the stack pointer, r13, and the program counter, r15. For a function to be ATPCS compliant it must
preserve the callee values of registers r4 to r11. ATPCS also specifies that the stack should be eight-byte
aligned; therefore you must preserve this alignment if calling subroutines. Use the following template for
optimized assembly routines requiring many registers:
39
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
The only purpose in stacking r12 is to keep the stack eight-byte aligned. You need not stack r12 if your
routine doesn’t call other ATPCS routines. For ARMv5 and above you can use the preceding template
even when being called from Thumb code. If your routine may be called from Thumb code on an
ARMv4T processor, then modify the template as follows:
40
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
Using More Than 14 Local Variables:
If you need more than 14 local 32-bit variables in a routine, then you must store some variables on the
stack. The standard procedure is to work outwards from the innermost loop of the algorithm, since the
innermost loop has the greatest performance impact.
CONDITIONAL EXECUTION:
The processor core can conditionally execute most ARM instructions. This conditional execution is based
on one of 15 condition codes. If you don’t specify a condition, the assembler defaults to execute always
condition (AL). The other 14 conditions split into seven pairs of complements. The conditions depend on
the four condition code flags N, Z, C, V stored in the cpsr register.
By default, ARM instructions do not update the N, Z, C, V flags in the ARM cpsr. For most instructions,
to update these flags you append an S suffix to the instruction mnemonic.
Exceptions to this are comparison instructions that do not write to a destination register. Their sole
purpose is to update the flags and so they don’t require the S suffix.
By combining conditional execution and conditional setting of the flags, you can implement simple if
statements without any need for branches. This improves efficiency since branches can take many cycles
and also reduces code size.
Example 17: The following C code converts an unsigned integer 0 ≤ i ≤ 15 to a hexadecimal character c:
The sequence works since the first ADD does not change the condition codes. The second ADD is still
conditional on the result of the compare.
Conditional execution is even more powerful for cascading conditions.
41
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
As soon as one of the TEQ comparisons detects a match, the Z flag is set in the cpsr. The following
TEQNE instructions have no effect as they are conditional on Z = 0. The next instruction to have effect is
the ADDEQ that increments vowel. You can use this method whenever all the comparisons in the if
statement are of the same type.
To implement this efficiently, we can use an addition or subtraction to move each range to the form 0 ≤ c
≤ limit. Then we use unsigned comparisons to detect this range and conditional comparisons to chain
together ranges. The following assembly implements this efficiently:
Note that the logical operations AND and OR are related by the standard logical relations as shown in the
following Table. You can invert logical expressions involving OR to get an expression involving AND,
which can often be useful in simplifying or rearranging logical expressions.
42
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
LOOPING CONSTRUCTS:
Most routines critical to performance will contain a loop. Note that, ARM loops are fastest when they
count down towards zero. This section describes how to implement these loops efficiently in assembly.
We also look at examples of how to unroll loops for maximum performance.
The loop overhead consists of a subtraction setting the condition codes followed by a conditional branch.
On ARM7 and ARM9 this overhead costs four cycles per loop. If i is an array index, then you may want to
count down from N−1 to 0 inclusive instead so that you can access array element zero. You can
implement this in the same way by using a different conditional branch:
In this arrangement the Z flag is set on the last iteration of the loop and cleared for other iterations. If
there is anything different about the last loop, then we can achieve this using the EQ and NE conditions.
For example, if you preload data for the next loop, then you want to avoid the preload on the last loop.
You can make all preload operations conditional on NE.
There is no reason why we must decrement by one on each loop. Suppose we require N/3 loops; rather
than attempting to divide N by three, it is far more efficient to subtract three from the loop counter on
each iteration:
43
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
Negative Indexing: This loop structure counts from −N to 0 (inclusive or exclusive) in steps of size
STEP.
Logarithmic Indexing: This loop structure counts down from 2N to 1 in powers of two. For example, if N
= 4, then it counts 16, 8, 4, 2, 1.
44
MICROCONTROLLER AND EMBEDDED SYSTEMS
21CS43
The following loop structure counts down from an N-bit mask to a one-bit mask. For example, if N = 4,
then it counts 15, 7, 3, 1.
45
MICROCONTROLLER AND EMBEDDED SYSTEMS
MODULE – 4
EMBEDDED SYSTEM DESIGN COMPONENTS
Power Concerns:
Power management is another important factor that needs to be considered in designing
embedded systems.
Embedded systems should be designed in such a way as to minimize the heat dissipation by the
system.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
The production of high amount of heat demands cooling requirements like cooling fans which in
turn occupies additional space and make a system bulky.
Nowadays ultra low power components are available in the market. Select the design according to
the low power components like low dropout regulators, and controllers/ processors with power
saving modes.
Also power management is a critical constraint in battery operated applications. The more the
power consumption the less is the battery life.
2. Throughput: deals with the efficiency of a system. Throughput is defined as the rate of
production or operation of a defined process over a stated period of time.
o The rates can be expressed in terms of units of products, batches produced, or any other
meaningful measurements.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o In case of a Card Reader, throughput means how many transactions the Reader can
perform in a minute or in an hour or in a day.
o Throughput is generally measured in terms of 'Benchmark'.
o A 'Benchmark' is a reference point by which something can be measured.
o Benchmark can be a set of performance criteria that a product is expected to meet or a
standard product that can be used for comparing other products of the same product line.
3. Reliability: is a measure of how much % you can rely upon the proper functioning of the system
or what is the% susceptibility of the system to failures.
o Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) are the terms used
in defining system reliability.
o MTBF gives the frequency of failures in hours/ weeks/ months.
o MTTR specifies how long the system is allowed to be out of order following a failure.
o For an embedded system with critical application need, it should be of order of minutes.
4. Maintainability: deals with support and maintenance to the end user or client in case of technical
issues and product failures or on the basis of a routine system checkup.
o Reliability and Maintainability are considered as two complementary disciplines.
o A more reliable system means a system with less corrective maintainability requirements and
vice versa. As the reliability of the system increases, the chances of failure and non-
functioning reduces, thereby the need for maintainability is also reduced.
o Maintainability is closely related to the system availability. Maintainability can be broadly
classified into two categories, namely, 'Scheduled or Periodic Maintenance (preventive
maintenance)' and 'Maintenance to unexpected failures (corrective maintenance)'.
o Some embedded products may use consumable components or may contain components
which are subject to wear and tear and they should be replaced on a periodic basis. The
period may be based on the total hours of the system usage or the total output the system
delivered.
o A printer is a typical example for illustrating the two types of maintainability. An inkjet
printer uses ink cartridges, which are consumable components and as per the printer
manufacturer the end user should replace the cartridge after each 'n' number of printouts,
to get quality prints. This is an example for 'Scheduled or Periodic maintenance'.
o If the paper feeding part of the printer fails the printer fails to print and it requires
immediate repairs to rectify this problem. This is an example of 'Maintenance to
unexpected failure'.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o In both of the maintenances (scheduled and repair), the-printer needs to be brought offline
and during this time it will not be available for the user.
o In any embedded system design, the ideal value for availability is expressed as
Ai = MTBF / (MTBF + MTTR)
Where, Ai – Availability in the ideal conditions.
5. Security: aspect covers ‘Confidentiality’, 'Integrity', and 'Availability' (The term 'Availability'
mentioned here is not related to the term 'Availability' mentioned under the 'Maintainability'
section).
o Confidentiality deals with the protection of data and application from unauthorized
disclosure.
o Integrity deals with the protection of data and application from unauthorized modifications.
o Availability deals with protection of data and application from unauthorized users.
o A very good example of the 'Security' aspect in an embedded product is a Personal
Digital Assistant (PDA). The PDA can be either a shared resource (e.g. PDAs used in
LAB setups) or an individual one.
o If it is a shared one, there should be some mechanism in the form of user name and password
to access into a particular person's profile – An example of' Availability.
o Also all data and applications present in the PDA need not be accessible to all users. Some of
them are specifically accessible to administrators only. For achieving this, Administrator and
user level s of security should be implemented – An example of Confidentiality.
o Some data present in the PDA may be visible to all users but there may not be necessary
permissions to alter the data by the users. That is Read Only access is allocated to all users –
An example of Integrity.
6. Safety: 'Safety' and 'Security' are confusing terms. Sometimes you may feel both of them as a
single attribute. But they represent two unique aspects in quality attributes.
o Safety deals with the possible damages that can happen to
o the operators,
o public and the environment;
o due to
the breakdown of an embedded system,
the emission of radioactive or hazardous materials from the embedded
products.
o The breakdown of an embedded system may occur due to a hardware failure or a firmware
failure.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o Safety analysis is a must in product engineering to evaluate the anticipated damages and
determine the best course of action to bring down the consequences of the damages to an
acceptable level.
o Some of the safety threats are sudden (like product breakdown) and some of them are gradual
(like hazardous emissions from the product).
4. Time to prototype and market: is the time elapsed between the conceptualization of a product
and the time at which the product is ready for selling (for commercial product) or use (for non-
commercial products).
o The commercial embedded product market is highly competitive and time to market the
product is a critical factor in the success of a commercial embedded product. There may be
multiple players in the embedded industry who develop products of the same category (like
mobile phone, portable media players, etc.). If you come up with a new design and if it takes
long time to develop and market it, the competitor product may take advantage of it with their
product.
o Also, embedded technology is one where rapid technology change is happening. If you start
your design by making use of a new technology and if it takes long time to develop and
market the product, by the time you market the product, the technology might have
superseded with a new technology.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o Product prototyping helps a lot in reducing time-to-market. Whenever you have a product
idea, you may not be certain about the feasibility of the idea.
o Prototyping is an informal kind of rapid product development in which the important features
of the product under consideration are developed.
o The time to prototype is also another critical factor. If the prototype is developed faster, the
actual estimated development time can be brought down significantly. In order to shorten the
time to prototype, make use of all possible options like the use of off-the-shelf components,
re-usable assets, etc.
5. Per unit and total cost: is a factor which is closely monitored by both end user (those who buy
the product) and product manufacturer (those who build the product).
o Cost is a highly sensitive factor for commercial products. Any failure to position the cost of a
commercial product at a nominal rate, may lead to the failure of the product in the market.
Proper market study and cost benefit analysis should be carried out before taking a decision
on the per-unit cost of the embedded product.
o From a designer/ product development company perspective the ultimate aim of a product is
to generate marginal profit. So the budget and total system cost should be properly balanced
to provide a marginal profit.
The Product Life Cycle (PLC): Every embedded product has a product life cycle which starts with the
design and development phase.
The product idea generation; prototyping, Roadmap definition, actual product design and
development are the activities carried out during this phase.
During the design and development phase there is only investment and no returns.
Once the product is ready to sell, it is introduced to the market. This stage is known as the
Product Introduction stage.
During the initial period the sales' and revenue will be low. There won't be much competition and
the product sales and revenue increases with time. In the growth phase, the product grabs high
market share.
During the maturity phase, the growth and sales will be steady and the revenue reaches its peak.
The Product retirement/ Decline phase starts with the drop in sales volume; market share and
revenue. The decline happens due to various reasons like competition from similar product with
enhanced features or technology changes, etc. At some point of the decline stage, the
manufacturer announces discontinuing of the product.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
The different stages of the embedded products life cycle-revenue, unit cost and profit in each
stage-are represented in the following Product Life-cycle graph.
The Functional block diagram of washing machine is shown in the following figure:
Washing machine comes in Two models, namely top loading and front loading machines.
In top loading models the agitator of the machine twists back and forth and pulls the cloth down
to the bottom of the tub. On reaching the bottom of the tub the cloths work their way back up to
the top of the tub where the agitator grabs them again and repeats the mechanism.
In the front loading machines, the clothes are tumbled and plunged into the water over and over
again.This is the first phase of washing.
In the second phase of washing ,water is pumped out from the tub and the inner tub uses
centrifugal force to wring out more water from the clothes by spinning at several hundred
Rotations Per Minute (RPM). This is called a 'Spin Phase'.
If you look into the keyboard panel of your washing machine you can see three buttons: Wash,
Spin and Rinse. You can use these buttons to configure the washing stages.
As you can see from the picture, the inner tub of the machine contains a number of holes and
during the spin cycle the inner tub spins, and forces the water out through these holes to the
stationary outer tub from which it is drained off through the outlet pipe.
It is to be noted that the design of washing machines may vary from manufacturer to
manufacturer, but the general principle underlying in the working of the washing machine remains the
same.
The basic controls consist of a timer, cycle selector mechanism, water temperature selector, load
size selector and start button.
The mechanism includes the motor, transmission, clutch, pump, agitator, inner tub, outer tub and
water inlet valve. Water inlet valve connects to the water supply line using at home and regulates
the flow of water into the tub.
The integrated control panel consists of a microprocessor/ controller based board with I/O
interfaces and a control algorithm running in it. Input interface includes the keyboard which
consists of wash type selector: Wash, Spin and Rinse; clothe selector: Light, Medium, Heavy
duty and washing time setting, etc.
The output interface consists of LED/ LCD displays, status indication LEDs, etc. connected to the
I/O bus of the controller.
The other types of l/O interfaces which are invisible to the end user are different kinds of sensor
interfaces: water temperature sensor, water level sensor, etc., and actuator interface including
motor control for agitator and tub movement control, inlet water flow control, etc.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
The various types of electronic control units (ECUs) used in the automotive embedded
industry can be broadly classified into two-High-speed embedded control units and Low-speed embedded
control units.
High-speed Electronic Control Units (HECUs): are deployed in critical control units requiring fast
response. They include fuel injection systems, antilock brake systems, engine control, electronic throttle,
steering controls, transmission control unit and central control unit.
Low-speed Electronic Control Units (LECUs): are deployed in applications where response time is not
so critical. They generally are built around low cost microprocessors/ microcontrollers and digital signal
processors. Audio controllers, passenger and driver door locks, door glass controls (power windows),
wiper control, mirror control, seat control systems, head lamp and tail lamp controls, sun roof control unit
etc., are examples of LECUs.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Automotive Communication Buses:
Automotive applications make use of serial buses for communication, which greatly reduces the amount
of wiring required inside a vehicle. The different types of serial interface buses deployed in automotive
embedded applications are –
1. Controller Area Network (CAN): The CAN bus was originally proposed by Robert Bosch,
pioneer in the Automotive embedded solution providers.
o CAN supports medium speed (ISO 11519-class B with data rates up to 125 Kbps) and high
speed (ISO 11898 class C with data rates up to 1Mbps) data transfer.
o CAN is an event-driven protocol interface with support for error handling in data
transmission.
o It is generally employed in-safety system like airbag control; power train systems like engine
control and Antilock Brake System (ABS); and navigation systems like GPS.
2. Local Interconnect Network (LIN): LIN bus is a single master multiple slave (up to 16
independent slave nodes) communication interface.
o LIN is a low speed, single wire communication interface with support for data rates up to 20
Kbps and is used or sensor/ actuator interfacing.
o LIN bus follows the master communication triggering technique to eliminate the possible bus
arbitration problem that can occur by the simultaneous talking of different slave nodes
connected to a single interface bus.
o LIN bus is employed in applications like mirror controls, fan controls, seat positioning
controls, window controls, and position controls where response time is not a critical issue.
3. Media Oriented System Transport (MOST) Bus: MOST is targeted for automotive audio/
video equipment interfacing, used primarily in European cars.
o A MOST bus is a multimedia fibre-optic point-to-point network implemented in a star, ring
or daisy- chained topology over optical fibre cables.
o The MOST bus specifications define the physical (electrical and optical parameters) layer as
well as the application layer, network layer, and media access control.
o MOST bus is an optical fibre cable connected between the Electrical Optical Converter
(EOC) and Optical Electrical Converter (OEC), which would translate into the optical cable
MOST bus.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Key Players of the Automotive Embedded Market:
The key players of the automotive embedded market can be visualized in three verticals namely, silicon
providers, tools and platform providers and solution providers.
1. Silicon Providers: are responsible for providing the necessary chips which are used in the control
application development.
o The chip may be a standard product like microcontroller or DSP or ADC/ DAC chips.
o Some applications may require specific chips and they are manufactured as Application
Specific Integrated Chip (ASIC).
o The leading silicon providers in the automotive industry are:
a) Analog Devices (www.analog.com): Provider of world class digital signal processing
chips, precision analog microcontrollers, programmable inclinometer/accelerometer,
LED drivers, etc. for automotive signal processing applications, driver assistance
systems, audio system, GPS/Navigation system, etc.
b) Xilinx (www.xilinx.com): Supplier of high performance FPGAs, CPLDs and automotive
specific IP cores for GPS navigation systems, driver information systems, distance
control, collision avoidance, rear1seat entertainment, adaptive cruise control, voice
recognition, etc.
c) Atmel (www.atmel.com): Supplier of cost-effective high-density Flash controllers and
memories. Atmel provides a series of high performance microcontrollers, namely,
ARM®1 and 80C51. A wide range of Application Specific Standard Products (ASSPs) for
chassis, body electronics, security, safety and car infotainment and automotive
networking products for CAN, LIN and FlexRay are also supplied by Atmel.
d) Maxim/Dallas (www.maxim-ic.com): Supplier of world class analog, digital and mixed
signal products (Microcontrollers, ADC/ DAC, amplifiers, comparators, regulators, etc),
RF components, etc. for all kinds of automotive solutions.
e) NXP semiconductor (www.nxp.com): Supplier of 8/ 16/ 32 Flash microcontrollers.
f) Texas Instruments (www.ti.com): Supplier of microcontrollers, digital signal
processors and automotive communication control chips for Local Inter Connect (LIN
bus products.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
2. Tool and Platform Providers: are manufacturers and suppliers of various kinds of development
tools and Real Time Embedded Operating Systems for developing and debugging different
control unit related applications.
o Tools fall into two categories, namely embedded software application development tools and
embedded hardware development tools.
o Some of the leading suppliers of tools and platforms in automotive embedded applications are
listed below:
a)
ENEA (www.enea.com): Enea Embedded Technology is the developer of the OSE Real-
Time operating system. The OSE RTOS supports both CPU and DSP and has also been
specially developed to support multi-core and fault-tolerant system development. .
b) The Math Works (www.mathworks.com): It is the world's leading developer and
supplier of technical software. It offers a wide range of tools, consultancy and training for
numeric computation, visualization, modeling and simulation across many different
industries. MathWork's breakthrough product is MATLAB – a high-level programming
language and environment for technical computation and numerical analysis. Together
MATLAB, SIMULINK, Stateflow and Real-Time Workshop provide top quality tools
for data analysis, test & measurement, application development and deployment, image
processing and development of dynamic and reactive systems for DSP and control
applications.
c) Keil Software (www.keil.com): The Integrated Development Environment Keil
Microvision from Keil software is a powerful embedded software design tool for 8051 &
C166 family of microcontrollers.
3. Solution Providers: Solution providers supply Original Equipment Manufacturer (OEM) and
complete solution for automotive applications making use of the chips, platforms and different
development tools.
o The major players of this domain are listed below:
a) Bosch Automotive (www.boschindia.com): Bosch is providing complete automotive
solution ranging from body electronics, diesel engine control, gasoline engine control,
power train systems, safety systems, in-car navigation systems and infotainment systems.
b) DENSO Automotive (www.globaldensoproducts.com): Denso is an OEM and solution
provider for engine management, climate control, body electronics, driving control &
safety, hybrid vehicles, embedded infotainment and communications.
c) Infosys Technologies (www.infosys.com): Infosys is a solution provider for
automotive embedded hardware and software. Infosys provides the competitive edge in
integrating technology change through cost effective solutions.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
1.Selecting the Model: In hardware software co-design, models are used for capturing and describing the
system characteristics.
A model is a formal system consisting of objects and composition rules. It is hard to make a
decision on which model should be followed in a particular system design. Most often designers
switch between varieties of models from the requirements specification to the implementation
aspect of the system design. The reason being, the objective varies with each phase.
o For example, at the specification stage, only the functionality of the system is in
focus and not the implementation information. When the design moves to the
implementation aspect, the information about the system component is revealed and the
designer has to switch to a model capable of capturing the system's structure.
2.Selecting the Architecture: A model only captures the system characteristics and does not provide
information on 'how the system can be manufactured?’
The architecture specifies how a system is going to implement in terms of the number and
types of different components and the interconnection among them.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Controller architecture, Datapath Architecture, Complex Instruction Set Computing (CISC),
Reduced Instruction Set Computing (RISC), Very Long Instruction Word Computing (VLIW),
Single Instruction Multiple Data (SIMD), Multiple Instruction Multiple Data (MIMD), etc., are
the commonly used architectures in system design.
Some of them fall into Application Specific Architecture Class (like Controller Architecture),
while others fall into either General Purpose Architecture Class (CISC, RISC, etc.) or Parallel
Processing Class (like VLIW, SIMD, MIMD, etc.).
o The Controller Architecture implements the finite state machine model (FSM) using a
state register and two combinational circuits. The state register holds the present state and
the combinational circuits implement the logic for next state and output.
o The Datapath Architecture is best suited for implementing the data flow graph model
where the output is generated as a result of a set of predefined computations on the input
data. A datapath represents a channel between the input and output; and in datapath
architecture the datapath may contain registers, counters, register files, memories and
ports along with high speed arithmetic units. Ports connect the datapath to multiple buses.
o The Finite State Machine Datapath (FSMD) architecture combines the controller
architecture with datapath architecture. It implements a controller with datapath. The
controller generates the control input, whereas the datapath processes the data. The
datapath contains two types of I/O ports, out of which one acts as the control port for
receiving/ sending the control signals from/ to the controller unit and the second I/O port
interfaces the datapath with external world for data input and data output.
o The Complex Instruction Set Computing (CISC) architecture uses an instruction set
representing complex operations. It is possible for a CISC instruction set to perform a
large complex operation with a single instruction. The use of a single complex instruction
in place of multiple simple instructions greatly reduces the program memory access and
program memory size requirement. However it requires additional silicon for
implementing microcode decoder for decoding the CISC instruction. The datapath for the
CISC processor is complex.
o The Reduced Instruction Set Computing (RISC) architecture reuses instruction set
representing simple operations and it requires the execution of multiple RISC instructions
to perform a complex operation. The data path of RISC architecture contains a large
register file for storing the operands and output. RISC instruction set is designed to
operate on registers. RISC architecture supports extensive pipelining.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o The Very Long Instruction Word (VLIW) architecture implements multiple functional
units (ALUs, multipliers, etc.) in the datapath. The VLIW instruction packages one
standard instruction per functional unit of the datapath.
o Parallel Processing architecture implements multiple concurrent Processing Elements
(PEs) and each processing element may associate a datapath containing register and local
memory.
o Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data
(MIMD) architectures are examples for parallel processing architecture.
In SIMD architecture, a single instruction is executed in parallel with the help of
the Processing Element. The scheduling-of the instruction execution and
controlling of each PE is performed through a single controller. The SIMD
architecture forms the basis of reconfigurable processor.
On the other hand, the processing elements of the MIMD architecture execute
different instructions at a given point of time. The MIMD architecture forms the
basis of multiprocessor systems. The PEs in a multiprocessor system
communicates through mechanisms like shared memory and message passing.
3.Selecting the Language: A programming language captures a 'Computational Model' and maps it into
architecture. There is no hard and fast rule to specify which language should be used for capturing this
model. A model can be captured using multiple programming languages like C, C++, C#, Java, etc. for
software implementations and languages like VHDL, System C, Verilog, etc. for hardware
implementations. On the other hand, a single language can be used for capturing a variety of models.
Certain languages are good in capturing certain computational model. For example, C++ is a good
candidate for capturing an object oriented model. The only pre-requisite in selecting a programming
language for capturing a model is that the language should capture the model easily.
4.Partitioning System Requirements into Hardware and Software: It may be possible to implement
the system requirements in either hardware or software (firmware). It is a tough decision making task to
figure out which one to opt. Various hardware software trade-offs are used for making a decision on the
hardware-software partitioning.
In a DFG model, a data path is the data flow path from input to output.
A DFG model is said to be acyclic DFG (ADFG), if it doesn't contain multiple values for the
input variable and multiple output values for a given set of input(s).
Feedback inputs (Output is fed back to Input), events, etc. are examples for non-acyclic inputs.
A DFG model translates the program as a single sequential process execution.
Control Data Flow Graph/ Diagram (CDFG) Model: In a DFG model, the execution is controlled by
data and it doesn't involve any control operations (conditionals).
The Control DFG (CDFG) model is used for modeling applications involving conditional
program execution. CDFG models contains both data operations and control operations.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
The CDFG uses Data Flow Graph (DFG) as element and conditional (constructs) as decision
makers. CDFG contains both data flow nodes and decision nodes, whereas DFG contains only
data flow nodes. Let us have a look at the implementation of the CDFG for the following
requirement.
If flag = 1, x = a + b; else y = a – b; this requirement contains a decision making process. The
CDFG model for the same is given in the above Figure.
The control node is represented by a 'Diamond' block which is the decision making element in a
normal flow chart based design. CDFG translates the requirement, which is modeled to a
concurrent process model. The decision on which process is to be executed is determined by the
control node.
o A real world example for modeling the embedded application using CDFG is the
capturing and saving of the image to a format set by the user in a digital still camera
where everything is data driven starting from the Analog Front End which converts the
CCD sensor generated analog signal to Digital Signal and the task which stores the data
from ADC to a frame buffer for the use of a media processor which performs various
operations like, auto correction, white balance adjusting, etc. The decision on, in which
format the image is stored (formats like JPEG, TIFF, BMP, etc.) is controlled by the
camera settings, configured by the user.
State Machine Model: The State Machine model is used for modeling reactive or event-driven
embedded systems whose processing behavior is dependent on state transitions. Embedded systems used
in the control and industrial applications are typical examples for event driven systems.
The State Machine model describes the system behavior with 'States', 'Events', 'Actions' and
'Transitions'.
o State is a representation of a current situation.
o An event is an input to the state. The event acts as stimuli for state transition.
o Transition is the movement from one state to another.
o Action is an activity to be performed by the state machine.
A Finite State Machine (FSM) model is one in which the number of states are finite. In other
words the system is described using a finite number of possible states.
o As an example let us consider the design of an embedded system for driver/ passenger
'Seat Belt Warning' in an automotive using the FSM model. The system requirements are
captured as.
o When the vehicle ignition is turned on and the seat belt is not fastened within 10 seconds
of ignition ON, the system generates an alarm signal for 5 seconds.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o The Alarm is turned off when the alarm time (5 seconds) expires or if the driver/
passenger fasten the belt or if the ignition switch is turned off, whichever happens first.
o Here the states are 'Alarm Off', 'Waiting' and 'Alarm On' and the events are 'Ignition Key
ON', 'Ignition Key OFF', 'Timer Expire', 'Alarm Time Expire' and 'Seat Belt ON'.
o Using the FSM, the system requirements can be modeled as given in following Figure.
o The 'Ignition Key ON' event triggers the 10 second timer and transitions the state to
'Waiting'.
o If a 'Seat Belt ON' or 'Ignition Key OFF' event occurs during the wait state, the state
transitions into 'Alarm Off'.
o When the wait timer expires in the waiting state, the event 'Timer Expire' is generated and
it transitions the state to 'Alarm On' from the 'Waiting' state.
o The 'Alarm On' state continues until a 'Seat Belt ON' or 'Ignition Key OFF' event or
'Alarm Time Expire' event, whichever occurs first. The occurrence of any of these events
transitions the state to 'Alarm Off'.
o The wait state is implemented using a timer. The timer also has certain set of states and
events for state transitions. Using the FSM model, the timer can be modeled as shown in
the following Figure.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
As seen from the FSM, the timer state can be either 'IDLE' or 'READY' or 'RUNNING'.
o During the normal condition when the timer is not running, it is said to be in the 'IDLE'
state.
o The timer is said to be in the 'READY' state when the timer is loaded with the count
corresponding to the required time delay. The timer remains in the 'READY' state until a
'Start Timer' event occurs.
o The timer changes its state to 'RUNNING' from the 'READY' state on receiving a 'Start
Timer' event and remains in the 'RUNNING' state until the timer count expires or a 'Stop
Timer' even occurs. The timer state changes to 'IDLE' from 'RUNNING' on receiving a
'Stop Timer' or 'Timer Expire' event.
Imp: Example 1: Design an automatic tea/ coffee vending machine based on FSM model
for the followingrequirement.
The tea/ coffee vending is initiated by user inserting a 5 rupee coin. After inserting the coin, the
user can either select 'Coffee' or 'Tea' or press 'Cancel' to cancel the order and take back the coin.
The FSM representation for the above requirement is given in the following Figure.
The FSM representation contains four states namely; 'Wait for coin' 'Wait for User Input',
'Dispense Tea' and 'Dispense Coffee'.
The event 'Insert Coin' (5 rupee coin insertion), transitions the state to 'Wait for User Input'. The
system stays in this state until a user input is received from the buttons 'Cancel', 'Tea' or 'Coffee'.
If the event triggered in 'Wait State' is 'Cancel' button press, the coin is pushed out and the state
transitions to 'Wait for Coin'. If the event received in the 'Wait State' is either 'Tea' button press, or
'Coffee' button press, the state changes to 'Dispense Tea' or 'Dispense Coffee' respectively.
Once the coffee/ tea vending is over, the respective states transitions back to the 'Wait for Coin'
state.
Example 2: Design a coin operated public telephone unit based on FSM model for
the followingrequirements.
1. The calling process is initiated by lifting the receiver (off-hook) of the telephone unit
2. After lifting the phone the user needs to insert a 1 rupee coin to make the call
3. If the line is busy, the coin is returned on placing the receiver back on the hook (on-hook)
4. If the line is through, the user is allowed to talk till 60 seconds and at the end of 45th second,
prompt for inserting another 1 rupee coin for continuing the call is initiated
5. If the user doesn't insert another 1 rupee coin, the call is terminated on completing the 60 seconds
time slot
6. The system is ready to accept new call request when the receiver is placed back on the hook (on-
hook)The system goes to the 'Out of Order' state when there is a line fault.
Most of the time state machine model translates the requirements into sequence driven program
and it is difficult to implement concurrent processing with FSM. This limitation is addressed by
the Hierarchical/ Concurrent Finite State Machine model (HCFSM).
The HCFSM is an extension of the FSM for supporting concurrency and hierarchy.
HCFSM extends the conventional state diagrams by the AND, OR decomposition of States
together with inter level transitions and a broadcast mechanism for communicating between
concurrent processes.
HCFSM uses statecharts for capturing the states, transitions, events and actions.
The Harel Statechart, UML State diagram, etc. are examples for popular statecharts used for the
HCFSM modeling of embedded systems.
MICROCONTROLLER AND EMBEDDED SYSTEMS
Sequential Program Model: In the sequential programming Model, the functions or processing
requirements are executed in sequence. It is same as the conventional procedural programming.
Here the program instructions are iterated and executed conditionally and the data gets
transformed through a series of operations
FSMs are good choice for sequential program modeling.
Another important tool used for modeling sequential program is Flow Charts.
The FSM approach represents the states, events, transitions and actions, whereas the
#define ON 1
#define OFF 0
#define YES 1
#define NO 0
void seat_belt_warn ()
{ wait_10sec ():
if (check_ignition_key () == ON)
{
if (check_seat_belt () == OFF)
{
set_timer (5);
start_alarm ():
while ((check_seat_belt ()
== OFF) &&
(check_ignition_key ()
== OFF) &&
(timer_expire () == ON));
stop_alarm ():
}
}
Below Fig : shows the flowchart for seat belt monitoring in sequential program model.
Concurrent/ Communicating Process Model: The concurrent or communicating process model
modelsconcurrently executing tasks/ processes.
It is easier to implement certain requirements in concurrent processing model than the
conventional sequential execution.
Sequential execution leads to a single sequential execution of task and thereby leads to poor
processor utilization, when the task involves I/O waiting, sleeping for specified duration etc.
If the task is split into multiple subtasks, it is possible to tackle the CPU usage effectively, when
the subtask under execution goes to a wait or sleep mode, by switching the task execution.
However, concurrent processing model requires additional overheads in task scheduling, task
synchronization and communication.
As an example for the concurrent processing model let us examine how we can implement the
'Seat Belt Warning' system in concurrent processing model. We can split the tasks into:
1. Timer task for waiting 10 seconds (wait timer task)
2. Task for checking the ignition key status (ignition key status monitoring task)
3. Task for checking the seat belt status (seat belt status monitoring task)
4. Task for starting and stopping the alarm (alarm control task)
5. Alarm timer task for waiting 5 seconds (alarm timer task)
We have five tasks here and we cannot execute them randomly or sequentially. We need to
synchronize their execution through some mechanism.
We need to start the alarm only after the expiration of the 10 seconds wait timer and that too only
if the seat belt is OFF and the ignition key is ON. Hence the alarm: control task is executed only
when the wait timer is expired and if the ignition key is in the ON state and seat belt is in the OFF
state.
One way of implementing a concurrent model for the 'Seat Belt Warning' system is illustrated
in the following Figure.
MICROCONTROLLER AND EMBEDDED SYSTEMS
However classes derived from a parent class can also access the protected member functions and
variables.
The concept of object and class brings abstraction, hiding and protection.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Pros
Doesn‟t require an Operating System for task scheduling and monitoring and free
from OS related overheads
Simple and straight forward design
Reduced memory footprint
Cons
Non Real time in execution behavior (As the number of tasks increases the frequency at
which a task gets CPU time for execution also increases)
Any issues in any task execution may affect the functioning of the product (This can be
effectively tackled by using Watch Dog Timers for task executionmonitoring)
Enhancements
The Embedded OS is responsible for scheduling the execution of user tasks andthe
allocation of system resources among multiple tasks
Involves lot of OS related overheads apart from managing and executing userdefined
tasks
Examples of GPOS for embedded devices: Microsoft® Windows XPEmbedded.
Examples of embedded devices running on embedded GPOSs: Point of Sale(PoS)
terminals, Gaming Stations, Tablet PCs etc.
Examples of RTOSs employed in Embedded Product development: „Windows
CE‟, „Windows Mobile‟,„QNX‟, „VxWorks‟, „ThreadX‟, „MicroC/OS-II‟, „Embedded
Linux‟, „Symbian‟ etc
Examples of embedded devices that runs on RTOSs: Mobile Phones, PDAs,Flight
Control Systems etc
Embedded firmware Development Languages/Options
Assembly Language
High Level Language
Subset of C (Embedded C) Subset
of C++ (Embedded C++)
Any other high level language with supported Cross-compilerMix of
Assembly & High level Language
Mixing High Level Language (Like C) with Assembly Code
Mixing Assembly code with High Level Language (Like C)Inline
Assembly
Assembly Language
„Assembly Language‟ is the human readable notation of „machine language‟
„machine language‟ is a processor understandable language
Machine language is a binary representation and it consists of 1s and 0s Assembly
language and machine languages are processor/controller dependent.
An Assembly language program written for one processor/controller family willnot work with
others
Assembly language programming is the process of writing processor specific
machine code in mnemonic form, converting the mnemonics into actual processor
instructions (machine language) and associated data using an assembler.
The general format of an assembly langsuage instruction is an Opcode followedby
Operands
The Opcode tells the processor/controller what to do.
The Operands provide the data and information required to perform the actionspecified by the
opcode
It is not necessary that all opcode should have Operands following them.
Some of the Opcode implicitly contains the operand and in such situation nooperand
isrequired.
The operand may be a single operand, dual operand or more.The
8051 Assembly Instruction
MOV A, #30
Moves decimal value 30 to the 8051 Accumulator register.
Here MOV A is the Opcode and 30 is the operand (single operand). The
same instruction when written in machine language will look like01110100
00011110
The first 8 bit binary value 01110100 represents the opcode for MOV A
The second 8 bit binary value 00011110 represents the operand 30.
Assembly language instructions are written one per line
A machine code program consists of a sequence of assembly language instructions, where
each statement contains a mnemonic (Opcode + Operand)
Each line of an assembly language program is split into four fields as:
The symbol ; represents the start of a comment. Assembler ignores the text in a lineafter the ;
symbol while assembling the program
DELAY is a label for representing the start address of the memory locationwhere the
piece ofcode is located in code memory
The above piece of code can be executed by giving the label DELAY as part oftheinstruction. E.g. LCALL
DELAY; LJMP DELAY
Source File to Object File Translation: Translation of assembly code to machine code is performed by
assembler. The assemblers for different target machines are different. Assemblers from multiple vendors are
available in market. A51 Macro assembler from Keil software is a popular assembler for the 8051 family
microcontroller.
The various steps involved in the conversion of a program written in assembly language to
corresponding binary file/ machine language is illustrated in the following Figure.
Each source module is written in Assembly and is stored as .src file or .asm file. Each file can be
assembled separately to examine the syntax errors and incorrect assembly instructions. On
successful assembling of each .src/ .asm file a corresponding object file is created with extension
'.obj'.
The object file does not contain the absolute address of where the generated code needs to be
placed on the program memory and hence it is called re-locatable segment. It can be placed at
Library File Creation and Usage: Libraries are specially formatted, ordered program
collections of object modules that may be used by the linker at a later time. When the
linker processes a library, only those object modules in the library that are necessary to create the
program are used. Library files are generated with extension '.lib'.
Library is some kind of source code hiding technique. If you don't want to reveal the source code
behind the various functions you have written in your program and at the same time you want
them to be distributed to application developers for making use of them in their applications, you
can supply them as library files and give them the details of the public functions available from
the library (function name, function input/output, etc). For using a library file in a project, add the
library to the project.
o 'LIB51' from Keil Software is an example for a library creator and it is used for creating
library files for A51 Assembler/ C51 Compiler for 8051 specific controller.
Linker and Locater: Linker and Locater is another software utility responsible for "linking the various
object modules in a multi-module project and assigning absolute address to each module".
Linker is a program which combines the target program with the code of other programs
(modules) and library routines.
During the process of linking, the absolute object module is created. The object module contains
the target code and information about other programs and library routines that are required to call
during the program execution.
An absolute object file or module does not contain any re-locatable code or data. All code and
data reside at fixed memory locations. The absolute object file is used for creating hex files for
dumping into the code memory of the processor/ controller.
'BL51' from Keil Software is an example for a Linker & Locater for A51 Assembler/ C51
Compiler for 8051 specific controller.
Object to Hex File Converter: This is the final stage in the conversion of Assembly language
(mnemonics) to machine understandable language (machine code).
Hex File is the representation of the machine code and the hex file is dumped into the
code memory of the processor/ controller.
The hex file representation varies depending on the target processor/ controller make.
o For Intel processors/ controllers the target hex file format will be 'Intel HEX' and for
Motorola, the hex file should be in 'Motorola HEX' format.
HEX files are ASCII files that contain a hexadecimal representation of target application. Hex
file is created from the final 'Absolute Object File' using the Object to Hex File Converter utility.
'QH51' from Keil software is an example for Object to Hex File Converter utility for A51
Assembler/ C51 Compiler for 8051 specific controller.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Drawbacks of Assembly Language Based Development: Every technology has its own pros and
cons. From certain technology aspects assembly language development is the most efficient technique.
But it is having the following technical limitations also.
High Development Time: Assembly language is much harder to program than high level
languages. The developer must pay attention to more details and must have thorough knowledge
of the architecture, memory organization and register details of the target processor in use.
Learning the inner details of the processor and its assembly instructions is highly time consuming
and it creates a delay impact in product development.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Developer Dependency: There is no common written rule for developing assembly language
based applications, whereas all high level languages instruct certain set of rules for application
development. In assembly language programming, the developers will have the freedom to
choose the different memory location and registers. Also the programming approach varies from
developer to developer depending on his/ her taste.
o For example moving data from a memory location to accumulator can be achieved
through different approaches.
o If the approach done by a developer is not documented properly at the development
stage, he/ she may not be able to recollect why this approach is followed at a later stage
or when a new developer is instructed to analyze this code, he/ she also may not be able
to understand. Hence upgrading an assembly program on a later stage is very difficult.
Non-Portable: Target applications written in assembly instructions are valid only for that
particular family of processors (e.g. Application written for Inte x86 family of processors) and
cannot be re-used for another target processors/ controllers (Say ARM Cortex M family of
processors). If the target processor/ controller changes, a complete re-writing of the application
using the assembly instructions for the new target processor/ controller is required.
‘C’ is the well defined, easy to use high level language with extensive cross platform
development tool support. Nowadays Cross-compilers for C+ + is also emerging out and
embedded developers are making use of C++ for embedded application development.
The various steps involved in high level language based embedded firmware development is
same as that of assembly language based development, except that the conversion of source file
written in high level language to object file is done by a cross-compiler, whereas in Assembly
language based development, it is carried out by an assembler.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
The various steps involved in the conversion of a program written in high level language to
corresponding binary file/ machine language is illustrated in the following Figure.
The program written in any of the high level language is saved with the corresponding language
extension (.c for C, .cpp for C++ etc). Any text editor like 'notepad' or 'WordPad ' from
Microsoft® or the text editor provided by an Integrated Development (IDE) tool supporting the
high level language can be used for writing the program.
Most of the high level languages support modular programming approach and hence you can
have multiple source files called modules written in corresponding high level language.
Translation of high level source code to executable object code is done by a cross-compiler. The
cross-compilers for different high level languages for the same target processor are different.
C51 is a popular. Cross-compiler available for 'C' language for the 8051 family of micro
controller. Conversion of each module's source code to corresponding object file is performed by
the cross compiler.
Rest of the steps involved in the conversion of high level language to target processor's machine
code are same as that of the steps involved in assembly language based development.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Mixing Assembly with High Level Language (e.g. Assembly Language with ‘C’): Assembly routines are
mixed with 'C' in situations where the entire program is written in 'C' and the cross-compiler do not have a
built in support for implementing certain features like Interrupt Service Routine functions (ISR) or if the
programmer wants to take advantage of the speed and optimized code offered by machine code generated
by hand written assembly rather than cross compiler generated machine code.
When accessing certain low level hardware, the timing specifications may be very critical and a cross-
compiler generated binary may not be able to offer the required time specifications accurately. Writing
the hardware/ peripheral access routine in processor/ controller specific Assembly language and invoking
it from 'C' is the most advised method to handle such situations.
Mixing 'C' and Assembly is little complicated; in the sense-the programmer must be aware of how
parameters are passed from the 'C' routine to Assembly and values a returned from assembly routine to 'C'
and how 'Assembly routine' is invoked from the 'C' code.
The following steps give an idea how C51 cross-compiler performs the mixing of Assembly code with
'C':
1. Write a simple function in C that passes parameters and returns values the way you want your
assembly routine to.
2. Use the SRC directive ( #PRAGMA SRC at the top of the file) so that the C compiler generates an
.SRC file instead of an .OBJ file.
3. Compile the C file. Since the SRC directive is specified, the .SRC file is generated. The .SRC file
contains the assembly code generated for the C code you wrote.
4. Rename the .SRC file to .A51 file.
5. Edit the .A51 file and insert the assembly code you want to execute in the body of the assembly
function shell included in the .A51 file.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Mixing High level language with Assembly (e.g. 'C' with Assembly Language): Mixing the code written
in a high level language like 'C' and Assembly language is useful in the following scenarios:
1. The source code is already available in Assembly language and a routine written in a high level
language like 'C' needs to be included to the existing code.
2. The entire source code is planned in Assembly code for various reasons like optimized code,
optimal performance, efficient code memory utilization and proven expertise in handling the
Assembly, etc. But some portions of the code may be very difficult and tedious to code in
Assembly. For example 16-bit multiplication and division in 8051 Assembly Language.
3. To include built in library functions written in 'C' language provided by the cross compiler. For
example: Built in Graphics library functions and String operations supported by 'C'.
Inline Assembly: Inline assembly is another technique for inserting target processor/ controller specific
Assembly instructions at any location of a source code written in high level language ‘C’. This avoids the
delay in calling an assembly routine from a ‘C’ code. Special keywords are used to indicate the start and
end of Assembly instructions. C51 uses the keywords #pragma asm and #pragma endasm to indicate a
block of code written in assembly.
‘C’ versus ‘Embedded C’: 'C' is a well structured, well defined and standardized general purpose
programming language with extensive bit manipulation support. 'C' offers a combination of the features of
high level language and assembly and helps in hardware access programming (system level
programming) as well as business package developments (Application developments like pay roll
systems, banking applications, etc). The conventional 'C' language follows ANSI standard and it
incorporates various library files for different operating systems. A platform (operating system) specific
application, known as, compiler is used for the conversion of programs written in 'C' to the target
processor (on which the OS is running) specific binary files. Hence it is a platform specific development.
Embedded C can be considered as a subset of conventional ‘C’ language. Embedded C supports all 'C'
instructions and incorporates a few target processor specific functions/ instructions. It should be noted
that the standard ANSI 'C' library implementation is always tailored to the target processor/ controller
library files in Embedded C. The implementation of target processor/ controller specific functions/
instructions depends upon the processor/ controller as well as supported cross-compiler for the particular
Embedded C language. A software program called 'Cross-compiler' is used for the conversion of
programs written in Embedded C to target processor/ controller specific instructions (machine language).
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
C Embedded
C
C is a general purpose programming language,
Embedded C is an extension of C language and used
which can be used to design any type of desktop-
to develop Microcontroller-based applications.
based applications.
C language program is hardware independent. Embedded C program is hardware dependent.
Embedded C requires specific compilers that are
C language uses standard compilers to compile
able to generate particular hardware/
and execute the program.
Microcontroller - based output.
Readability, modifications, bug fixing, etc., are Readability, modifications, bug fixing, etc., are not
very easy in a C language program. easy in a Embedded C language program.
Compiler versus Cross-Compiler: Compiler is a software tool that converts a source code written in
a high level language on top of a particular operating system running on specific target processor
architecture (e.g. Intel x86/ Pentium). Here the operating system, the complier program and the
application making use of the source code run on the same target processor. The source code is converted
to the target processor specific machine instructions. The development is platform specific (OS as well as
target processor on which the OS is running). Compilers are generally termed as 'Native Compilers'. A
native compiler generates machine code for the same machine (processor) on which it is running.
It converts high language into computers native language. For eg:Turbo C compiler.
NOTE: The term 'Compiler' is used interchangeably with 'Cross-compiler' in embedded firmware
applications. Whenever you see the term 'Compiler' related to any embedded firmware application, please
understand that it is referring the cross-compiler.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Question Bank
1. Explain characteristics of an embedded system?
2. Explain Quality attributes of embedded system?
3. Explain the Operational and non operational attributes of an Embedded System.
4. Explain data flow graph and control data flow graph.
5. With neat block diagram explain design and working of Washing machine- Application
specific embedded system .
6. Explain Automotive - Domain specific embedded system?
7. Explain the different communication buses used in automotive application.
8. What is hardware software co-design?Explain fundamental issues in hardware software co
design?
9. With a state diagram explain automatic seat belt control problem.
10. Explain with a neat block diagram, how source file to object file translation takes
Place.
11. Explain how assembly language source file is translated to machine language object file.
12. With neat sketch explain various computational models in embedded system?
13. Explain the different Embedded firmware design’ approaches in detail?
14. Explain Super loop based approach of embedded firmware design.
15. With FSM model, explain the design and operation of automatic tea/coffee vending machine.
16. Differentiate between
a. Compiler and Cross Compiler.
b. C with Embedded C
17. Explain the following embedded firmware development languages?
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
*********
*********
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
MODULE – 5
RTOS AND IDE FOR EMBEDDED SYSTEM DESIGN
The following Figure gives an insight into the basic components of an operating system and their
interfaces with rest of the world.
User Applications
Application Programming
Interface (API)
Memory Management
Kernel Services
Process Management
Time Management
Primary Memory Management: Primary memory refers to a volatile memory (RAM), where processes
are loaded and variables and shared data are stored.
The Memory Management Unit (MMU) of the kernel is responsible for –
Keeping a track of which part of the memory area is currently used by which process
Allocating and De-allocating memory space on a need basis.
File System Management: File is a collection of related information. A file could be a program (source
code or executable), text files, image files, word documents, audio/ video files, etc. A file system
management service of kernel is responsible for –
The creation, deletion and alteration of files
Creation, deletion, and alteration of directories
Saving of files in the secondary storage memory
Providing automatic allocation of file space based on the amount of free running space available
Providing flexible naming conversion for the files.
I/O System (Device) Management: Kernel is responsible for routing the I/O requests coming from
different user applications to the appropriate I/O devices of the system. In a well structured OS, direct
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
access to I/O devices is not allowed; access to them is establish through Application Programming
Interface (API). The kernel maintains list of all the I/O devices of the system. The service „Device
Manager‟ of the kernel is responsible for handling all I/O related operations. The Device Manager is
responsible for –
Loading and unloading of device drivers
Exchanging information and the system specific control signals to and from the device.
Secondary Storage Management: The secondary storage management deals with managing the
secondary storage memory devices (if any) connected to the system. Secondary memory is used as
backup medium for programs and data, as main memory is volatile. In most of the systems secondary
storage is kept in disks (hard disks). The secondary storage management service of kernel deals with –
Disk storage allocation
Disk scheduling
Free disk space management
Protection Systems: Modern operating systems are designed in such way to support multiple users with
different levels of access permissions. The protection deals with implementing the security policies to
restrict the access of system resources and particular user by different application or processes and
different user.
Interrupt Handler: Kernel provides interrupt handler mechanism for all external/ internal interrupt
generated by the system.
Microkernel: The microkernel design incorporates only essential set of operating system services
into the kernel. The rest of the operating systems services are implemented in program known as
„Servers‟ which runs in user space. The memory management, timer systems and interrupt
handlers are the essential services, which forms the part of the microkernel. The benefits of micro
kernel based designs are –
o Robustness: If a problem is encountered in any of the services, which runs as a server
can be reconfigured and restarted without the restarting the entire OS. Here chances of
corruption of kernel services are ideally zero.
o Configurability: Any services, which runs as a server application can be changed without
the need to restart the whole system. This makes the system dynamically configurable.
The Real-Time kernel: The kernel of a Real-Time OS is referred as Real-Time kernel. The Real-Time
kernel is highly specialized and it contains only the minimal set of services required for running user
applications/ tasks. The basic functions of a Real-Time kernel are listed below:
Task/ Process management
Task/ Process scheduling
Task/ Process synchronization
Error/ Exception handling
Memory management
Interrupt handling
Time management.
Task/ Process Management: Deals with setting up the memory space for the tasks, loading the
task‟s code into the memory space, allocating system resources and setting up a Task Control
Block (TCB) for the task and task/process termination/deletion.
o A Task Control Block (TCB) is used for holding the information corresponding to a task.
TCB usually contains the following set of information:
Task ID: Task Identification Number
Task State: The current state of the task. (E.g. State = „Ready‟ for a task which is
ready to execute)
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Task Type: Task type. Indicates what is the type for this task. The task can be a
hard real time or soft real time or background task.
Task Priority: Task priority (E.g. Task priority = 1 for task with priority = 1)
Task Context Pointer: Context pointer. Pointer for context saving
Task Memory Pointers: Pointers to the code memory, data memory and stack
memory for the task
Task System Resource Pointers: Pointers to system resources (semaphores,
mutex, etc.) used by the task
Task Pointers: Pointers to other TCBs (TCBs for preceding, next and waiting
tasks)
Other Parameters: Other relevant task parameters.
o The parameters and implementation of the TCB is kernel dependent. The TCB
parameters vary across different kernels based on the task management implementation.
Task/ Process Scheduling: Deals with sharing the CPU among various tasks/ processes. A kernel
application called „Scheduler‟ handles the task scheduling. Scheduler is an algorithm
implementation, which performs the efficient and optimal scheduling of tasks to provide a
deterministic behavior.
Task/ Process Synchronization: Deals with synchronizing the concurrent access of a resource,
which is shared across multiple tasks and the communication between various tasks.
Error/ Exception Handling: Deals with registering and handling the errors occurred/
exceptions raised during the execution of tasks.
o Insufficient memory, timeouts, deadlocks, deadline missing, bus error, divide by zero,
unknown instruction execution etc, are examples of errors/exceptions.
o Errors/ Exceptions can happen at the kernel level services or at task level.
Deadlock is an example for kernel level exception, whereas timeout is an
example for a task level exception.
Deadlock is a situation where a set of processes are blocked because each
process is holding a resource and waiting for another resource acquired
by some other process.
Timeouts and retry are two techniques used together. The tasks retries an
event/ message certain number of times; if no response is received after
exhausting the limit, the feature might be aborted.
o The OS kernel gives the information about the error in the form of a system call (API).
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Memory Management: The memory management function of an RTOS kernel is slightly
different compared to the General Purpose Operating Systems.
o In general, the memory allocation time increases depending on the size of the block of
memory need to be allocated and the state of the allocated memory block. RTOS
achieves predictable timing and deterministic behavior, by compromising the
effectiveness of memory allocation.
o RTOS generally uses „block‟ based memory allocation technique, instead of the usual
dynamic memory allocation techniques used by the GPOS. RTOS kernel uses blocks of
fixed size of dynamic memory and the block is allocated for a task on a need basis. The
blocks are stored in a „Free buffer Queue‟.
o Most of the RTOS kernels allow tasks to access any of the memory blocks without any
memory protection to achieve predictable timing and avoid the timing overheads. Some
commercial RTOS kernels allow memory protection as optional and the kernel enters a
fail-safe mode when an illegal memory access occurs.
o The memory management function a block of fixed memory is always allocated for tasks
on need basis and it is taken as a unit. Hence, there will not be any memory
fragmentation issues.
Interrupt Handling: Deals with the handling of various interrupts. Interrupts inform the
processor that an external device or an associated task requires immediate attention of the CPU.
o Interrupts can be either Synchronous or Asynchronous.
Interrupts which occurs in sync with the currently executing task is known as
Synchronous interrupts. Usually the software interrupts fall under the
Synchronous Interrupt category.
Divide by zero, memory segmentation error etc are examples of
Synchronous interrupts.
For synchronous interrupts, the interrupt handler runs in the same context of the
interrupting task.
Interrupts which occurs at any point of execution of any task, and are not in sync
with the currently executing task are Asynchronous interrupts.
Timer overflow interrupts, serial data reception/ transmission interrupts
etc., are examples for asynchronous interrupts.
For asynchronous interrupts, the interrupt handler is usually written as separate
task (depends on OS Kernel implementation) and it runs in a different context.
Hence, a context switch happens while handling the asynchronous interrupts.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o Priority levels can be assigned to the interrupts and each interrupts can be enabled or
disabled individually. Most of the RTOS kernel implements „Nested Interrupts‟
architecture.
Time Management: Accurate time management is essential for providing precise time reference
for all applications. The time reference to kernel is provided by a high-resolution Real Time
Clock (RTC) hardware chip (hardware timer).
o The hardware timer is programmed to interrupt the processor/ controller at a fixed rate.
This timer interrupt is referred as „Timer tick‟. The „Timer tick‟ is taken as the timing
reference by the kernel. The „Timer tick‟ interval may vary depending on the hardware
timer. Usually, the „Timer tick‟ varies in the microseconds range. The time parameters
for tasks are expressed as the multiples of the „Timer tick‟.
o The System time is updated based on the „Timer tick‟. If the System time register is 32
bits wide and the „Timer tick‟ interval is 1 microsecond, the System time register will
reset in;
232 * 10–6 / (24 * 60 * 60) = ~ 0.0497 Days = 1.19 Hours
o If the „Timer tick‟ interval is 1 millisecond, the System time register will reset in
232 * 10–3 / (24 * 60 * 60) = 49.7 Days = ~ 50 Days
o The „Timer tick‟ interrupt is handled by the „Timer Interrupt‟ handler of kernel. The
„Timer tick‟ interrupt can be utilized for implementing the following actions:
Save the current context (Context of the currently executing task)
Increment the System time register by one. Generate timing error and reset the
System time register if the timer tick count is greater than the maximum range
available for System time register.
Update the timers implemented in kernel (Increment or decrement the timer
registers for each timer depending on the count direction setting for each register.
Increment registers with count direction setting = „count up‟ and decrement
registers with count direction setting = „count down‟)
Activate the periodic tasks, which are in the idle state
Invoke the scheduler and schedule the tasks again based on the scheduling
algorithm
Delete all the terminated tasks and their associated data structures (TCBs)
Load the context for the first task in the ready queue. Due to the re-scheduling,
the ready task might be changed to a new one from the task, which was pre-
empted by the „Timer Interrupt‟ task.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Hard Real-Time: A Real Time Operating Systems which strictly adheres to the timing
constraints for a task is referred as hard real-time systems. A Hard Real Time system must meet
the deadlines for a task without any slippage. Missing any deadline may produce catastrophic
results for Hard Real Time Systems, including permanent data lose and irrecoverable damages to
the system/users.
o Hard real-time systems emphasize on the principle „A late answer is a wrong answer‟.
For example, Air bag control systems and Anti-lock Brake Systems (ABS) of
vehicles are typical examples of Hard Real Time Systems.
o Most of the Hard Real Time Systems are automatic.
Soft Real-Time: Real Time Operating Systems that does not guarantee meeting deadlines, but,
offer the best effort to meet the deadline are referred as soft real-time systems. Missing deadlines
for tasks are acceptable if the frequency of deadline missing is within the compliance limit of the
Quality of Service (QoS).
o Soft real-time system emphasizes on the principle „A late answer is an acceptable answer,
but it could have done bit faster‟.
o Automatic Teller Machine (ATM) is a typical example of Soft Real Time System. If the
ATM takes a few seconds more than the ideal operation time, nothing fatal happens.
Process:
A „Process‟ is a program, or part of it, in execution. Process is also known as an instance of a program
in execution. A process requires various system resources like CPU for executing the process, memory
for storing the code corresponding to the process and associated variables, I/O devices for information
exchange etc.
Structure of a Processes: The concept of „Process‟ leads to concurrent execution of tasks and
thereby, efficient utilization of the CPU and other system resources. Concurrent execution is
achieved through the sharing of CPU among the processes.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o A process mimics a processor in properties and holds a set of registers, process status, a
Program Counter (PC) to point to the next executable instruction of the process, a stack
for holding the local variables associated with the process and the code corresponding to
the process. This can be visualized as shown in the following Figure.
o A process, which inherits all the properties of the CPU, can be considered as a virtual
processor, awaiting its turn to have its properties switched into the physical processor.
When the process gets its turn, its registers and Program Counter register becomes
mapped to the physical registers of the CPU.
o The memory occupied by the process is segregated into three regions namely; Stack
memory, Data memory and Code memory (Figure, shown above).
The „Stack‟ memory holds all temporary data such as variables local to the
process.
The „Data‟ memory holds all global data for the process.
The „Code‟ memory contains the program code (instructions) corresponding to
the process.
o On loading a process into the main memory, a specific area of memory is allocated for
the process. The stack memory usually starts at the highest memory address from the
memory area allocated for the process.
Process States & State Transition: The creation of a process to its termination is not a single step
operation. The process traverses through a series of states during its transition from the newly
created state to the terminated state.
o The cycle through which a process changes its state from „newly created‟ to „execution
completed‟ is known as „Process Life Cycle‟.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o The various states through which a process traverses through during a Process Life Cycle
indicates the current status of the process with respect to time and also provides
information on what it is allowed to do next.
o The transition of a process from one state to another is known as „State transition‟. The
Process states and state transition representation are shown in the following Figure.
Threads:
A thread is the primitive that can execute code. A thread is a single sequential flow of control within a
process. A thread is also known as lightweight process.
A process can have many threads of execution. Different threads, which are part of a process,
share the same address space; meaning they share the data memory, code memory and heap
memory area.
Threads maintain their own thread status (CPU register values), Program Counter (PC) and stack.
The memory model for a process and its associated threads are given in the following figure.
The Concept of Multithreading: The process is split into multiple threads, which executes a
portion of the process; there will be a main thread and rest of the threads will be created within
the main thread.
o The multithreaded architecture of a process can be visualized with the thread-process
diagram, shown below.
o Use of multiple threads to execute a process brings the following advantage:
Better memory utilization: Multiple threads of the same process share the
addressspace for data memory. This also reduces the complexity of inter thread
communication since variables can be shared across the threads.
Since the process is split into different threads, when one thread enters a wait
state, the CPU can be utilized by other threads of the process that do not require
the event, which the other thread is waiting, for processing. This speeds up the
execution of the process.
Efficient CPU utilization. The CPU is engaged all time.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Thread Standards: Thread standards deal with the different standards available for thread
creation and management. These standards are utilized by the Operating Systems for thread
creation and thread management. It is a set of thread class libraries. The commonly available
thread class libraries are –
o POSIX Threads: POSIX stands for Portable Operating System Interface. The POSIX.4
standard deals with the Real Time extensions and POSIX.4a standard deals with thread
extensions. The POSIX standard library for thread creation and management is
„Pthreads‟. „Pthreads‟ library defines the set of POSIX thread creation and management
functions in „C‟ language. (Example 1 – Self study).
o Win32 Threads: Win32 threads are the threads supported by various flavors of Windows
Operating Systems. The Win32 Application Programming Interface (Win32 API)
libraries provide the standard set of Win32 thread creation and management functions.
Win32 threads are created with the API.
o Java Threads: Java threads are the threads supported by Java programming Language.
The java thread class „Thread‟ is defined in the package „java.lang‟. This package needs
to be imported for using the thread creation functions supported by the Java thread class.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
There are two ways of creating threads in Java: Either by extending the base „Thread‟
class or by implementing an interface. Extending the thread class allows inheriting the
methods and variables of the parent class (Thread class) only whereas interface allows a
way to achieve the requirements for a set of classes.
Thread Pre-emption: Thread pre-emption is the act of pre-empting the currently running thread
(stopping temporarily). It is dependent on the Operating System. It is performed for sharing the
CPU time among all the threads. The execution switching among threads are known as „Thread
context switching‟. Threads falls into one of the following types:
o User Level Thread: User level threads do not have kernel/ Operating System support and
they exist only in the running process. A process may have multiple user level threads;
but the OS threats it as single thread and will not switch the execution among the
different threads of it. It is the responsibility of the process to schedule each thread as and
when required. Hence, user level threads are non-preemptive at thread level from OS
perspective.
o Kernel Level/ System Level Thread: Kernel level threads are individual units of
execution, which the OS treats as separate threads. The OS interrupts the execution of the
currently running kernel thread and switches the execution to another kernel thread based
on the scheduling policies implemented by the OS.
The execution switching (thread context switching) of user level threads happen
only when the currently executing user level thread is voluntarily blocked.
Hence, no OS intervention and system calls are involved in the context switching
of user level threads. This makes context switching of user level threads very
fast.
Kernel level threads involve lots of kernel overhead and involve system calls for
context switching. However, kernel threads maintain a clear layer of abstraction
and allow threads to use system calls independently.
There are many ways for binding user level threads with kernel/ system level
threads; which are explained below:
Many-to-One Model: Many user level threads are mapped to a single
kernel thread. The kernel treats all user level threads as single thread and
the execution switching among the user level threads happens when a
currently executing user level thread voluntarily blocks itself or
relinquishes the CPU. Solaris Green threads and GNU Portable Threads
are examples for this.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
One-to-One Model: Each user level thread is bonded to a kernel/ system
level thread. Windows XP/NT/2000 and Linux threads are examples of
One-to-One thread models.
Many-to-Many Model: In this model many user level threads are allowed
to be mapped to many kernel threads. Windows NT/2000 with
ThreadFiber package is an example for this.
Thread versus Process:
Thread Process
Thread is a single unit of execution and is part of Process is a program in execution and contains one
process. or more threads.
A thread does not have its own data memory and Process has its own code memory, data memory,
heap memory. and stack memory.
A thread cannot live independently; it lives within
A process contains at least one thread.
the process.
There can be multiple threads in a process; the first Threads within a process share the code, data and heap
(main) thread calls the main function and occupies memory; each thread holds separate memory
the start of the stack memory of the process. area for stack.
Processes are very expensive to create; involves many OS
Threads are very inexpensive to create. overhead.
Context switching is complex and involves lots of
Context switching is inexpensive and fast.
OS overhead and comparatively slow.
If a process dies, the resource allocated to it are reclaimed
If a thread expires, its stack is reclaimed by the
by the OS and all associated threads of
process.
the process also dies.
Multitasking involves „Context switching‟ (see the following Figure), „Context saving‟ and
„Context retrieval‟.
o The act of switching CPU among the processes or changing the current execution context
is known as „Context switching‟.
o The act of saving the current context (details like Register details, Memory details,
System Resource Usage details, Execution details, etc.) for the currently running
processes at the time of CPU switching is known as „Context saving‟.
o The process of retrieving the saved context details for a process, which is going to be
executed due to CPU switching, is known as „Context retrieval‟.
Types of Multitasking:
Depending on how the task/ process execution switching act is implemented, multitasking can is
classified into –
Co-operative Multitasking: Co-operative multitasking is the most primitive form of multitasking
in which a task/ process gets a chance to execute only when the currently executing task/ process
voluntarily relinquishes the CPU. In this method, any task/ process can avail the CPU as much
time as it wants. Since this type of implementation involves the mercy of the tasks each other for
getting the CPU time for execution, it is known as co-operative multitasking. If the currently
executing task is non-cooperative, the other tasks may have to wait for a long time to get the
CPU.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Preemptive Multitasking: Preemptive multitasking ensures that every task/ process gets a chance
to execute. When and how much time a process gets is dependent on the implementation of the
preemptive scheduling. As the name indicates, in preemptive multitasking, the currently running
task/process is preempted to give a chance to other tasks/process to execute. The preemption of
task may be based on time slots or task/ process priority.
Non-preemptive Multitasking: The process/ task, which is currently given the CPU time, is
allowed to execute until it terminates (enters the „Completed‟ state) or enters the „Blocked/ Wait‟
state, waiting for an I/O. The co-operative and non-preemptive multitasking differs in their
behavior when they are in the „Blocked/Wait‟ state. In co-operative multitasking, the currently
executing process/task need not relinquish the CPU when it enters the „Blocked/ Wait‟ sate,
waiting for an I/O, or a shared resource access or an event to occur whereas in non-preemptive
multitasking the currently executing task relinquishes the CPU when it waits for an I/O.
TASK COMMUNICATION:
In a multitasking system, multiple tasks/ processes run concurrently (in pseudo parallelism) and each
process may or may not interact between. Based on the degree of interaction, the processes/ tasks running
on an OS are classified as –
Co-operating Processes: In the co-operating interaction model, one process requires the inputs
from other processes to complete its execution.
Competing Processes: The competing processes do not share anything among themselves but
they share the system resources. The competing processes compete for the system resources such
as file, display device, etc.
o The co-operating processes exchanges information and communicate through the
following methods:
Co-operation through sharing: Exchange data through some shared resources.
Co-operation through Communication: No data is shared between the processes.
But they communicate for execution synchronization.
The mechanism through which tasks/ processes communicate each other is known as Inter Process/ Task
Communication (IPC). IPC is essential for process co-ordination. The various types of IPC mechanisms
adopted by process are kernel (Operating System) dependent. They are explained below.
IPC Mechanism - Shared Memory:
Processes share some area of the memory to communicate among them (see the following Figure).
Information to be communicated by the process is written to the shared memory area. Processes which
require this information can read the same from the shared memory area.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
The implementation of shared memory is kernel dependent. Different mechanisms are adopted by
different kernels for implementing this, a few among are s follows:
1. Pipes: „Pipe‟ is a section of the shared memory used by processes for communicating. Pipes
follow the client-server architecture. A process which creates a pipe is known as pipe server
and a process which connects to a pipe is known as pipe client. A pipe can be considered as a
medium for information flow and has two conceptual ends. It can be unidirectional, allowing
information flow in one direction or bidirectional allowing bi-directional information flow. A
unidirectional pipe allows the process connecting at one end of the pipe to write to the pipe
and the process connected at the other end of the pipe to read the data, whereas a bi-
directional pipe allows both reading and writing at one end. The unidirectional pipe can be
visualized as
2. Mailbox: Mailbox is a special implementation of message queue. Usually used for one way
communication, only a single message is exchanged through mailbox whereas „message
queue‟ can be used for exchanging multiple messages. One task/process creates the mailbox
and other tasks/process can subscribe to this mailbox for getting message notification. The
implementation of the mailbox is OS kernel dependent. The MicroC/ OS-II RTOS
implements mailbox as a mechanism for inter task communication
3. Signalling: Signals are used for an asynchronous notification mechanism. The signal mainly
used for the execution synchronization of tasks process/ tasks. Signals do not carry any data
and are not queued. The implementation of signals is OS kernel dependent and VxWorks
RTOS kernel implements „signals‟ for inter process communication.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
IPC Mechanism - Remote Procedure Call (IPC) and Sockets: Remote Procedure Call is the Inter
Process Communication (IPC) mechanism used by a process, to call a procedure of another process
running on the same CPU or on a different CPU which is interconnected in a network. In the object
oriented language terminology, RPC is also known as Remote Invocation or Remote Method Invocation
(RMI). The CPU/ process containing the procedure which needs to be invoked remotely is known as
server. The CPU/ process which initiates an RPC request is known as client.
In order to make the RPC communication compatible across all platforms, it should stick on to
certain standard formats.
Interface Definition Language (IDL) defines the interfaces for RPC. Microsoft Interface
Definition Language (MIDL) is the IDL implementation from Microsoft for all Microsoft
platforms.
The RPC communication can be either Synchronous (Blocking) or Asynchronous (Non-
blocking).
Sockets are used for RPC communication. Socket is a logical endpoint in a two-way communication link
between two applications running on a network. A port number is associated with a socket so that the
network layer of the communication channel can deliver the data to the designated application. Sockets
are of different types namely; Internet sockets (INET), UNIX sockets, etc.
The INET Socket works on Internet Communication protocol. TCP/ IP, UDP, etc., are the
communication protocols used by INET sockets.
INET sockets are classified into:
o Stream Sockets: are connection oriented and they use TCP to establish a reliable
connection.
o Datagram Sockets: rely on UDP for establishing a connection.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
TASK SYNCHRONIZATION:
In a multitasking environment, multiple processes run concurrently and share the system resources. Also,
each process may communicate with each other with different IPC mechanisms. Hence, there may be
situations that; two processes try to access a shared memory area, where one process tries to write to the
memory location when the other process is trying to read from the same memory location. This will lead
to unexpected results.
The solution is, make each process aware of access of a shared resource. The act of making the processes
aware of the access of shared resources by each process to avoid conflicts is known as “Task/ Process
Synchronization”.
Task/ Process Synchronization is essential for –
1. Avoiding conflicts in resource access (racing, deadlock, etc.) in multitasking environment.
2. Ensuring proper sequence of operation across processes.
3. Establish proper communication between processes.
The code memory area which holds the program instructions (piece of code) for accessing a shared
resource is known as „Critical Section‟. In order to synchronize the access to shared resources, the access
to the critical section should be exclusive.
Task Communication/ Synchronization Issues:
Various synchronization issues may arise in a multitasking environment, if processes are not
synchronized properly in shared resource access, such as:
1. Racing: Look into the following piece of code:
#include <stdio.h>
//****************************************************************
//counter is an integer variable and Buffer is a byte array shared
//between two processes Process A and Process B.
char Buffer [10] = {1,2,3,4,5,6,7,8,9,10};
short int counter = 0;
//****************************************************************
// Process A
Void Process_A (void)
{
int i;
for (i =0; i<5; i++)
{
if (Buffer [i] > 0)
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
counter++;
}
}
// Process B
Void Process_B (void)
{
int j;
for (j =5; j<10; j++)
{
if (Buffer[j] > 0)
counter++;
}
}
//Main Thread.
int main()
{
DWORD id;
CreateThread (NULL, 0, (LPTHREAD_START_ROUTINE) Process_A,
(LPVOID) 0, 0, &id);
CreateThread (NULL, 0, (LPTHREAD_START_ROUTINE) Process_B,
(LPVOID) 0, 0, &id);
Sleep (100000);
return 0;
}
From a programmer perspective, the value of counter will be 10 at the end of execution of
processes A & B. But it need not be always.
o The program statement counter++; looks like a single statement from a high level
programming language (C Language) perspective. The low level implementation of this
statement is dependent on the underlying processor instruction set and the (cross) compiler in
use. The low level implementation of the high level program statement counter++; under
Windows XP operating system running on an Intel Centrino Duo processor is given below.
mov eax, dword ptr [ebp-4] ;Load counter in Accumulator
add eax, 1 ; Increment Accumulator by 1
mov dword ptr [ebp-4], eax ;Store counter with Accumulator
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o At the processor instruction level, the value of the variable counter is loaded to the
Accumulator register (EAX Register). The memory variable counter is represented using a
pointer. The base pointer register (EBP Register) is used for pointing to the memory variable
counter. After loading the contents of the variable counter to the Accumulator, the
Accumulator content is incremented by one using the add instruction. Finally the content of
Accumulator is loaded to the memory location which represents the variable counter. Both
the processes; Process A and Process B contain the program statement counter++;
Translating this into the machine instruction.
Process Process B
A
mov eax,dword ptr [ebp-4] mov eax, dword ptr [ebp-4]
mov dword ptr [e bp-4], eax mov dword ptr [ebp-4], eax
o Imagine a situation where a process switching (context switching) happens from Process A to
Process B when Process A is executing the counter++; statement. Process A accomplishes
the counter++; statement through three different low level instructions. Now imagine that the
process switching happened at the point, where Process A executed the low level instruction
mov eax, dword ptr [ebp-4] and is about to execute the next instruction add eax, 1. The
scenario is illustrated in the following Figure.
o Process B increments the shared variable „counter‟ in the middle of the operation where
Process A tries to increment it. When Process A gets the CPU time for execution, it starts
from the point where it got interrupted (If Process B is also using the same registers eax and
ebp for executing counter++; instruction, the original content of these registers will be saved
as part of context saving and it will be retrieved back as part of the context retrieval, when
Process A gets the CPU for execution. Hence the content of eax and ebp remains intact
irrespective of context switching). Though the variable counter is incremented by Process B,
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Process A is unaware of it and it increments the variable with the old value. This leads to the
loss of one increment for the variable counter.
2. Deadlock: Deadlock is the condition in which a process is waiting for a resource held by another
process which is waiting for a resource held by the first process; hence, none of the processes are
able to make any progress in their execution.
o Process A holds a resource „x‟ and it wants a resource „y‟ held by Process B. Process B is
currently holding resource „y‟ and it wants the resource „x‟ which is currently held by Process
A. Both hold the respective resources and they compete each other to get the resource held by
the respective processes.
Functional Requirements:
Processor Support: It is not necessary that all RTOS‟s support all kinds of processor architecture.
It is essential to ensure the processor support by the RTOS.
Memory Requirements: The OS requires ROM memory for holding the OS files and it is
normally stored in a non-volatile memory like FLASH. OS also requires working memory RAM
for loading the OS services. Since embedded systems are memory constrained, it is essential to
evaluate the minimal ROM and RAM requirements for the OS under consideration.
Real-time Capabilities: It is not mandatory that the operating system for all embedded systems
need to be Real-time and all embedded Operating systems-are 'Real-time' in behavior. The task/
process scheduling policies play an important role in the 'Real-time' behavior of an OS. Analyze
the real-time capabilities of the OS under consideration and the standards met by the operating
system for real-time capabilities.
Kernel and Interrupt Latency: The kernel of the OS may disable interrupts while executing
certain services and it may lead to interrupt latency. For an embedded system whose response
requirements are high, this latency should be minimal.
Inter Process Communication and Task Synchronization: The implementation of Inter Process
Communication and Synchronization is OS kernel dependent. Certain kernels may provide a
bunch of options whereas others provide very limited options. Certain kernels implement policies
for avoiding priority inversion issues in resource sharing.
Modularization Support: Most of the operating systems provide a bunch of features. At times it
may not be necessary for an embedded product for its functioning. It is very useful if the OS
supports moclularisation where in which the developer can choose the essential modules and re-
compile the OS image for functioning. Windows CE is an example for a highly modular
operating system.
Support for Networking and Communication: The OS kernel may provide stack implementation
and driver support for a bunch of communication interfaces and networking. Ensure that the OS
under consideration provides support for all the interfaces required by the embedded product.
Development Language Support: Certain operating systems include the run time libraries
required for running applications written in languages like Java and C#. A Java Virtual Machine
(JVM) customized for the Operating System is essential for running java applications. Similarly
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
the .NET Compact Framework (.NETCF) is required for running Microsoft .NET applications on
top of the Operating System. The OS may include these components as built-in component, if
not; check the availability of the same from a third party vendor or the OS under consideration.
Non-functional Requirements:
Custom Developed or Off the Shelf: Depending on the OS requirement, it is possible to go for the
complete development of an operating system suiting the embedded system needs or use an off
the shelf, readily available operating system, which is either a commercial product or an Open
Source product, which is in close match with the system requirements. Sometimes it may be
possible to build the required features by customizing an Open source OS. The decision on which
to select is purely de• pendent on the development cost, licensing fees for the OS, development
time and availability of skilled resources.
Cost: The total cost for developing or buying the OS and maintaining it in terms of commercial
product and custom build needs to be evaluated before taking a decision on the selection of OS.
Development and Debugging Tools Availability: The availability of development and debugging
tools is a critical decision making factor in the selection of an OS for embedded design. Certain
Operating Systems may be superior in performance, but the availability of tools for supporting
the development may be limited. Explore the different tools available for the OS under
consideration.
Ease of Use: How easy it is to use a commercial RTOS is another important feature that needs to
be considered in the RTOS selection.
After Sales: For a commercial embedded RTOS, after sales in the fom1 of e-mail, on-call services
etc., for bug fixes, critical patch updates and support for production issues, etc., should be
analyzed thoroughly.
The sequence of operations for embedding the firmware with a programmer is listed below:
1. Connect the programming device to the specified port of PC (USB/COM port/Parallel port)
2. Power up the device (Most of the programmers incorporate LED to indicate Device power up.
Ensure that the power indication LED is ON)
3. Execute the programming utility on the PC and ensure proper connectivity is established between
PC and programmer. In case of error turn off device power and try connecting it again
4. Unlock the ZIF socket by turning the lock pin
5. Insert the device to be programmed into the open socket as per the insert diagram shown on the
programmer
6. Lock the ZIF socket
7. Select the device name from the list of supported devices
8. Load the hex file which is to be embedded into the device
9. Program the device by 'Program' option of utility program
10. Wait till the completion of programming operation (Till busy LED of programmer is off)
11. Ensure that programming is success by checking the status LED on the programmer (Usually
'Green' for success and 'Red' for error condition) or by noticing the feedback from the utility
program
12. Unlock the ZIF socket and d take the device out of programmer.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Now the firmware is successfully embedded into the device. Insert the device into the board, power up the
board and test it for the required functionalities. It is to be noted that the most of programmers support
only Dual Inline Package (DIP) chips, since its ZIF socket is designed to accommodate only DIP chips.
Option for setting firmware protection will be available on the programming utility. If you really want the
firmware to be protected against unwanted external access, and if the device is supporting memory
protection, enable the memory protection on the utility before programming the device.
The programmer usually erases the existing content of the chip before programming the chip. Only
EEPROM and FLASH memory chips are erasable by the programmer.
The major drawback of out-of-circuit programming is the high development time. Whenever the firmware
is changed, the chip should be taken out of the development board for re-programming. This is tedious
and prone to chip damages due to frequent insertion and removal.
The out-of-system programming technique is used for firmware integration for low end embedded
products which runs without an operating system. Out-of-circuit programming is commonly used for
development of low volume products and Proof of Concept (PoC) product Development.
In System Programming with SPI Protocol: Devices with SPI (Serial Peripheral Interface) ISP (In
System Programming) support contains a built-in SPI interface and the on-chip EEPROM or FLASH
memory. The primary I/O lines involved in SPI-In System Programming are listed below:
MOSI – Master Out Slave In
MISO – Master In Slave Out
SCK – System Clock
RST – Reset of Target Device
GND – Ground of Target Device
PC acts as the master and target device acts as the slave in ISP. The program data is sent to the MOSI pin
of target device and the device acknowledgement is originated from the MISO pin of the device. SCK pin
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
acts as the clock for data transfer. A utility program can be developed on the PC side to generate the
above signal lines.
Standard SPI-ISP utilities are feely available on the internet and, there is no need for going for writing
own program. For ISP operations, the target device needs to be powered up in a pre-defined sequence.
The power up sequence for In System Programming for Atmel's AT89S series microcontroller family is
listed below:
1. Apply supply voltage between VCC and GND pins of target chip
2. Set RST pin to "HIGH" state
3. If a crystal is not connected across pins XTAL 1 and XTAL2, apply a 3 MHz to 24 MHz clock to
XTALl pin and wait for at least 10 milliseconds
4. Enable serial programming by sending the Programming Enable serial instruction to pin MOSI/
Pl.5. The frequency of the shift clock supplied at pin SCK/ P1.7 needs to be less than the CPU
clock at XTALl divided by 40
5. The Code or Data array is programmed one byte at a time by supplying the address and data
together with the appropriate Write instruction. The selected memory location is first erased
before the new data is written. The write cycle is self-timed and typically takes less than 2.5 ms at
5V
6. Any memory location can be verified by using the Read instruction, which returns the content at
the selected address at serial output MISO/ Pl.6
7. After successfully programming the device, set RST pin low or turn off the chip power supply
and turn it ON to commence the normal operation.
The key player behind ISP is a factory programmed memory (ROM) called 'Boot ROM‟. The Boot ROM
normally resides at the top end of code memory space and it varies in the order of a few Kilo Bytes (For a
controller with 64K code memory space and lK Boot ROM, the Boot ROM resides at memory location
FC00H to FFFFH). It contains a set of Low-level Instruction APIs and these APIs allow the processor/
controller to perform the FLASH memory programming, erasing and Reading operations. The contents of
the Boot ROM are provided by the chip manufacturer and the same is masked into every device.
The prototype/ evaluation/ production version must pass through a varied set of tests to verify that
embedded hardware and firmware functions as expected. Bring up process includes –
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
basic hardware spot checks/ validations to make sure that the individual components and busses/
interconnects are operational – which involves checking power, clocks, and basic functional
connectivity;
basic firmware verification to make sure that the processor is fetching the code and the firmware
execution is happening in the expected manner;
running advanced validations such as memory validations, signal integrity validation, etc.
DISASSEMBLER/ DECOMPLIER:
Disassembler is a utility program which converts machine codes into target processor specific Assembly
codes/ instructions. The process of converting machine codes into Assembly code is known as
'Disassembling'. In operation, disasseri1bling is complementary to assembling/ cross-assembling.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Decompiler is the utility program for translating machine codes into corresponding high level language
instructions. Decompiler performs the reverse operation of compiler/ cross-compiler.
The disassemblers/ decompilers for different family of processors/ controllers are different.
Disassemblers/ Decompilers are deployed in reverse engineering. Reverse engineering is the process of
revealing the technology behind the working of a product. Reverse engineering in Embedded Product
development is employed to find out the secret behind the working of popular proprietary products.
Disassemblers /decompilers help the reverse engineering process by translating the embedded firmware
into Assembly/ high level language instructions.
Disassemblers/ Decompilers are powerful tools for analyzing the presence of malicious codes (virus
information) in an executable image. Disassemblers/ Decompilers are available as either freeware tools
readily available for free download from internet or as commercial tools.
It is not possible for a disassembler/ decompiler to generate an exact replica of the original assembly
code/ high level source code in terms of the symbolic constants and comments used. However
disassemblers/ decompilers generate a source code which is somewhat matching to the original source
code from which the binary code is generated.
SIMULATORS, EMULATORS AND DEBUGGING:
Simulators and emulators are two important tools used in embedded system development.
Simulator is a software tool use for simulating the various conditions for checking the
functionality of the application firmware. The Integrated Development Environment (IDE) itself
will be providing simulator support and they help in debugging the firmware for checking its
required functionality. In certain scenarios, simulator refers to a soft model (GUI model) of the
embedded product.
o For example, if the product under development is a handheld device, to test the
functionalities of the various menu and user interfaces, a soft form model of the product
with all UI as given in the end product can be developed in software. Soft phone is an
example for such a simulator.
Emulator is hardware device which emulates the functionalities of the target device and allows
real time debugging of the embedded firmware in a hardware environment.
Simulators:
Simulators simulate the target hardware and the firmware execution can be inspected using simulators.
The features of simulator based debugging are listed below.
1. Purely software based
2. Doesn't require a real target system
3. Very primitive (Lack of featured I/O support. Everything is a simulated one)
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
4. Lack of Real-time behavior.
Advantages of Simulator Based Debugging: Simulator based debugging techniques are simple
and straightforward .The major advantages of simulator based firmware debugging techniques are
explained below.
No Need for Original Target Board: Simulator based debugging technique is purely software
oriented. IDE's software support simulates the CPU of the target board. User only needs to know
about the memory map of various devices within the target board and the firmware should be
written on the basis of it. Since the real hardware is not required, firmware development can start
well in advance immediately after the device interface and memory maps are finalized. This saves
development time.
Simulate I/O Peripherals: Simulator provides the option to simulate various I/O peripherals.
Using simulator's I/O support you can edit the values for I/O registers and can be used as the
input/ output value in the firmware execution. Hence it eliminates the need for connecting I/O
devices for debugging the firmware.
Simulates Abnormal Conditions: With simulator's simulation support you can input any desired
value for any parameter during debugging the firmware and can observe the control flow of
firmware. It really helps the developer in simulating abnormal operational environment for
firmware and helps the firmware developer to study the behavior of the firmware under abnormal
input conditions.
Limitations of Simulator Based Debugging: Though simulation based firmware debugging
technique is very helpful in embedded applications, they possess certain limitations and we cannot fully
rely on the simulator-based firmware debugging. Some of the limitations of simulator-based debugging
are explainedbelow:
Deviation from Real Behavior: Simulation-based firmware debugging is always carried out in a
development environment where the developer may not be able to debug the firmware under all
possible combinations of input. Under certain operating conditions, we may get some particular
result and it need not be the same when the firmware runs in a production environment.
Lack of Real Timeliness: The major limitation of simulator based debugging is that it is not real-
time in behavior. The debugging is developer driven and it is no way capable of creating a real
time behavior. Moreover in a real application the I/O condition may be varying or unpredictable.
Simulation goes for simulating those conditions for known values.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
Emulators and Debuggers:
Debugging in embedded application is the process of diagnosing the firmware execution, monitoring the
target processor's registers and memory, while the firmware is running and checking the signals from
various buses of the embedded hardware. Debugging process in embedded application is broadly
classified into two, namely; hardware debugging and firmware debugging.
Hardware debugging deals with the monitoring of various bus signals and checking the status
lines of the target hardware.
Firmware debugging deals with examining the firmware execution, execution flow, changes to
various CPU registers and status registers on execution of the firmware to ensure that the
firmware is running as per the design.
Firmware debugging is performed to figure out the bug or the error in the firmware which creates the
unexpected behavior. The following section describes the improvements over firmware debugging
starting from the most primitive type of debugging to the most sophisticated On Chip Debugging (OCD):
Incremental EEPROM Burning Technique: This is the most primitive type of firmware
debugging technique where the code is separated into different functional code units. Instead of
burning the entire code into the EEPROM chip at once, the code is burned in incremental order,
where the code corresponding to all functionalities are separately coded, cross-compiled and
burned into the chip one by one.
Inline Breakpoint Based Firmware Debugging: Inline breakpoint based debugging is another
primitive method of firmware debugging. Within the firmware where you want to ensure that
firmware execution is reaching up to a specified point, insert an inline debug code immediately
after the point. The debug code is a printf() function which prints a string given as per the
firmware. You can insert debug codes (printf()) commands at each point where you want to
ensure the firmware execution is covering that point. Cross-compile the source code with the
debug codes embedded within it. Burn the corresponding hex file into the EEPROM.
Monitor Program Based Firmware Debugging: Monitor program based firmware debugging is
the first adopted invasive method for firmware debugging (see the following Figure). In this
approach a monitor program which acts as a supervisor is developed. The monitor program
controls the downloading of user code into the code memory, inspects and modifies register/
memory locations; allows single stepping of source code, etc. The monitor program implements
the debug functions as per a pre-defined command set from the debug application interface. The
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
monitor program always listens to the serial port of the target device and according to the
command received from the serial interface it performs command specific actions like firmware
downloading, memory inspection/ modification, firmware single stepping and sends the debug
information (various register and memory contents) back to the main debug program running on
the development PC, etc.
o The first step in any monitor program development is determining a set of commands for
performing various operations like firmware downloading, memory/ register inspection/
modification, single stepping, etc. The entire code stuff handling the command reception
and corresponding action implementation is known as the "monitor program". The most
common type of interface used between target board and debug application is RS-232C
Serial interface.
o The monitor program contains the following set of minimal features:
1. Command set interface to establish communication with the debugging
application
2. Firmware download option to code memory
3. Examine and modify processor registers and working memory (RAM)
4. Single step program execution
5. Set breakpoints in firmware execution
6. Send debug information to debug application running on host machine.
In Circuit Emulator (ICE) Based Firmware Debugging: The terms 'Simulator' and 'Emulator'
are little bit confusing and sounds similar. Though their basic functionality is the same-"Debug
the target firmware", the way in which they achieve this functionality is totally different. The
simulator 'simulates' the target board CPU and the emulator 'emulates' the target board CPU.
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
o 'Simulator' is a software application that precisely duplicates (mimics) the target CPU
and simulates the various features and instructions supported by the target CPU.
o 'Emulator' is a self-contained hardware device which emulates the target CPU. The
emulator hardware contains necessary emulation logic and it is hooked to the debugging
application running on the development PC on one end and connects to the target board
through some interface on the other end.
o The Emulator POD (see the following Figure) forms the heart of any emulator system
and it contains the following functional units.
o Emulation Device: is a replica of the target CPU which receives various signals from the
target board through a device adaptor connected to the target board and performs the
execution of firmware under the control of debug commands from the debug application.
o Emulation Memory: is the Random Access Memory (RAM) incorporated in the Emulator
device. It acts as a replacement to the target board's EEPROM where the code is
supposed to be downloaded after each firmware modification. Hence the original
EEPROM memory is emulated by the RAM of emulator. This is known as 'ROM
Emulation'. ROM emulation eliminates the hassles of ROM burning and it offers the
benefit of infinite number of reprogramming.
o Emulator Control Logic: is the logic circuits used for implementing complex hardware
breakpoints, trace buffer trigger detection, trace buffer control, etc. Emulator control
logic circuits are also used for implementing logic analyzer functions in advanced
emulator devices. The 'Emulator POD' is connected to the target board through a 'Device
adaptor' and signal cable.
o Device Adaptors: act as an interface between the target board and emulator POD. Device
adaptors are normally pin-to-pin compatible sockets which can be inserted/ plugged into
the target board for routing the various signals from pins assigned for the target
MICROCONTROLLER AND EMBEDDED SYSTEMS
18CS44
processor. The device adaptor is usually connected to the emulator POD using ribbon
cables.
On Chip Firmware Debugging (OCD): Advances in semiconductor technology has brought out
new dimensions to target firmware debugging. Today almost all processors/controllers in•
corporate built in debug modules called On Chip Debug (OCD) support. Though OCD adds
silicon complexity and cost factor, from a developer perspective it is a very good feature
supporting fast and efficient firmware debugging. The On Chip Debug facilities integrated to the
processor/ controller are chip vendor dependent and most of them are proprietary technologies
like Background Debug Mode (BDM), OnCE, etc.
Logic Analyzer:
A logic analyzer is the big brother of digital CRO. Logic analyzer is used for capturing digital data (logic
1 and 0) from a digital circuitry whereas CRO is employed in capturing all kinds of waves including logic
signals. Another major limitation of CRO is that the total number of logic signals/ waveforms that can be
captured with a CRO is limited to the number of channels.
A logic analyzer contains special connectors and clips which can be attached to the target board for
capturing digital data. In target board debugging applications, a logic analyzer captures the states of
various port pins, address bus and data bus of the target processor/ controller, etc.
Logic analyzers give an exact reflect on of what happens when a particular line of firmware is running.
This is achieved by capturing the address line logic and data line logic of target hardware. Most modem
logic analyzers contain provisions for storing captured data, selecting a desired region of the captured
waveform, zooming selected region of the captured waveform, etc. Tektronix, Agilent, etc. are the giants
in the logic analyzer market.
18CS44
Function Generator:
Function generator is not a debugging tool. It is a input signal simulator tool. A function generator is
capable of producing various periodic waveforms like sine wave, square wave, saw-tooth wave, etc. with
different frequencies and amplitude.
Sometimes the target board may require some kind of periodic waveform with a particular frequency as
input to some part of the board. Thus, in a debugging environment, the function generator serves the
purpose of generating and supplying required signals.
BOUNDARY SCAN:
As the complexity of the hardware increase, the number of chips present in the board and the
interconnection among them may also increase. The device packages used in the PCB become miniature
to reduce the total board space occupied by them and multiple layers may be required to route the
interconnections among the chips. With miniature device packages and multiple layers for the PCB it will
be very difficult to debug the hardware using magnifying glass, multimeter, etc. to check the
interconnection among the various chips.
Boundary scan is a technique used for testing the interconnection among the various chips, which support
JTAG interface, present in the board. Chips which support boundary scan associate a boundary scan cell
with each pin of the device.
A JTAG port contains the five signal lines, namely, TDI, TDO, TCK, TRST and TMS form the Test
Access Port (TAP) for a JTAG supported chip. Each device will have its own TAP. The PCB also
contains a TAP for connecting the JTAG signal lines to the external world.
A boundary scan path is formed inside the board by interconnecting the devices through JTAG signal
lines. The TDI pin of the TAP of the PCB is connected to the TDI pin of the first device.
The TDO pin of the first device is connected to the TDI pin of the second device. In this way all devices
are interconnected and the TDO pin of the last JTAG device is connected to the TDO pin of the TAP of
the PCB. The clock line TCK and the Test Mode Select (TMS) line of the devices are connected to the
clock line and Test mode select line of the Test Access Port of the PCB respectively. This forms a
boundary scan path.