0% found this document useful (0 votes)
3 views164 pages

Unit 3

The document discusses real-time systems, emphasizing their need to process information and respond within specified time constraints to avoid severe consequences. It outlines the structure of real-time systems, including the roles of sensors, controllers, and actuators, and covers task assignment and scheduling techniques to ensure deadlines are met. Additionally, it explores concepts like multiprogramming, multitasking, multiprocessing, and multithreading, along with scheduling algorithms such as Rate Monotonic and Earliest Deadline First.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views164 pages

Unit 3

The document discusses real-time systems, emphasizing their need to process information and respond within specified time constraints to avoid severe consequences. It outlines the structure of real-time systems, including the roles of sensors, controllers, and actuators, and covers task assignment and scheduling techniques to ensure deadlines are met. Additionally, it explores concepts like multiprogramming, multitasking, multiprocessing, and multithreading, along with scheduling algorithms such as Rate Monotonic and Earliest Deadline First.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 164

UNIT-3

PROCESSES AND OPERATING SYSTEMS


What is a real time system?
• A real-time system is one that must process
information and produce a response within a
specified time, else risk severe consequences,
including failure.

Unit-IV-Real Time Systems 2


Unit-IV-Real Time Systems 3
Unit-IV-Real Time Systems 4
Unit-IV-Real Time Systems 5
STRUCTURE OF A REAL-TIME SYSTEM

Unit-IV-Real Time Systems 6


STRUCTURE OF A REAL-TIME SYSTEM
• The state of the controlled process and of the
operating environment (e.g., pressure,
temperature, speed, and altitude) is acquired
by sensors, which provide inputs to the
controller, the real-time computer.
• The data rate from each sensor depends on
how quickly the measured parameters can
change.

Unit-IV-Real Time Systems 7


STRUCTURE OF A REAL-TIME SYSTEM

• There is a fixed set of application tasks or jobs.


• The software for these tasks is preloaded into the
computer.
• If the computer has a shared main memory, then
the entire software is loaded into that.
• If, on the other hand, it consists of a set of private
memories belonging to individual processors.

Unit-IV-Real Time Systems 8


STRUCTURE OF A REAL-TIME SYSTEM
• The "trigger generator" is a representation of the
mechanism used to trigger the execution of individual
jobs.
• It is not really a separate hardware unit; typically
it is part of the executive software.
• Many of the jobs are periodic (i.e., they execute
regularly).
• The schedule for these jobs can be obtained offline and
loaded as a lookup table to be used by the scheduler.
• Jobs can also be initiated depending on the state of the
controlled process or on the operating environment.

Unit-IV-Real Time Systems 9


STRUCTURE OF A REAL-TIME SYSTEM
• The output of the computer is fed to the
actuators and the displays.
• Fault tolerant techniques ensure that, despite
a small number of erroneous outputs from the
computer, the actuators are set correctly.
• The actuators typically have a mechanical or a
hydraulic component, and so their time
constants are quite high.
• As a result, the data rates to the actuators are
quite low: one command per 25 ms.
Unit-IV-Real Time Systems 10
STRUCTURE OF A REAL-TIME SYSTEM
• A control computer exhibits a dichotomy in
terms of the data rates.
• The sensors and actuators run at relatively low
data rates.
• The computer itself must be fast enough to
execute the control algorithms, and these can
require throughputs in excess of 50 million
instructions per second (MIPS).

Unit-IV-Real Time Systems 11


STRUCTURE OF A REAL-TIME SYSTEM
As a result, the system separates into three areas:
• An outer low-rate area consisting of the sensors,
actuators displays, and input panels.

• A middle or peripheral area consisting of the


processing that is necessary to format the data.

• The central cluster of processors where the control


algorithms are executed.

Unit-IV-Real Time Systems 12


STRUCTURE OF A REAL-TIME SYSTEM

For example, there are the four


phases in a civilian aircraft:
1. Take off and cruise until VHF
omni range VOR/DME
(distance measuring
equipment) is out of range.
2. Cruise until VOR/DME is in
range again.
3. Cruise until landing is to be
initiated.
4. Land.

Unit-IV-Real Time Systems 13


TASK ASSIGNMENT AND SCHEDULING
• Here we look into techniques for allocating
and scheduling tasks on processors to ensure that
deadlines are met.
• The allocation/scheduling problem can be stated as
follows.
• Given a set of tasks, task precedence constraints,
resource requirements, task characteristics and
deadlines, we are asked to devise a feasible
allocation/schedule on a given computer.

Unit-IV-Real Time Systems 14


TASK ASSIGNMENT AND SCHEDULING
• A task consumes resources (e.g., processor time, memory,
and input data), and puts out one or more results.
• Tasks may have precedence constraints, which specify if any
task(s) needs to precede other tasks.
• Each task has resource requirements.
• Sometimes, a resource must be exclusively held by a task
(i.e., the task must have sole possession of it). In other
cases, resources are nonexclusive.
• The same physical resource may be exclusive or
nonexclusive depending on the operation to be performed
on it.

Unit-IV-Real Time Systems 15


TASK ASSIGNMENT AND SCHEDULING
Task Characteristics:
• All tasks require some execution time on a processor.
• A task may require a certain amount of memory or
access to a bus.
• The release time of a task is the time at which all the
data that are required to begin executing the task.
• The deadline is the time by which the task must
complete its execution.
• A task assignment/schedule is said to be feasible if all the
tasks start after their release times and complete before
their deadlines.

Unit-IV-Real Time Systems 16


TASK SCHEDULING
• Offline scheduling, involves scheduling in advance of the
operation
• Online scheduling, the tasks are scheduled as they arrive in
the system.
• There are algorithms that assume that the task priority does
not change within mode called static-priority algorithms.
Ex: Rate-monotonic (RM) algorithm
• Some algorithms assume that task priority can change with
time called dynamic-priority algorithms.
Ex: Earliest Deadline First (EDF) algorithm

Unit-IV-Real Time Systems 17


TASK SCHEDULING
• A schedule is preemptive if tasks can be interrupted by other
tasks (and then resumed).
• A schedule is non preemptive it must be run to completion or
until it gets blocked over a resource.
• Consider a two-task system. Let the release times of tasks T1 and T2
be 1 and 2, respectively; the deadlines be 9 and 6; and the execution times
be 3.25 and 2 respectively.

Unit-IV-Real Time Systems 18


Multiple tasks and multiple processes

🡺 Process

🡺 Multiprogramming

🡺 Multitasking

🡺 Multiprocessing

🡺 Multithreading

19
1
9
Process
🡺A single execution of a program is called as
Process.
🡺If we run the same program two different times,
we have created two different processes.
🡺Each process has its own state that includes not
only its registers but all of its memory.
🡺In some OSs, the memory management unit is used
to keep each process in a separate address space.
🡺In others, particularly lightweight RTOSs, the
processes run in the same address space.
🡺 Processes that
often called share the same address space are 204
threads
Multiprogramming

🡺Multiprogramming is also the ability of an


operating system to execute more than one
program on a single processor machine.
🡺More than one task/program/job/process can
reside into the main memory at one point of
time.
🡺A computer running excel and firefox browser
simultaneously is an example of
multiprogramming.
21
Memory Layout for Multiprogramming System

22
Multitasking

23
Multitasking

🡺 Multitasking is ability of an
operating
the system to execute morethan one
simultaneously on a single processor machine.
task
🡺Though we say so but in reality no two tasks on
a single processor machine can be executed at
the same time.
🡺Actually CPU switches from one task to the next
task so quickly that appears as if all the tasks
are executing at the same time.
24
Multitasking System

25
Multiprocessing
🡺Multiprocessing is the ability of an operating
system to execute more than one process
simultaneously on a multi processor machine.
🡺In this, a computer uses more than one CPU at a
time.

26
Multithread
🡺 Threads are the light wait processes which are independent part of a
process or program
🡺 Processes that share the same address space are often called threads

27
Multithread
🡺Multithreading is the ability of an operating
system to execute the different parts of a
program called threads at the same time.

🡺 Threads are the light wait processes which are


independent part of a process or program.

🡺In multithreading system, more than one


threads are executed parallel on a single CPU.
28
Threads vs Process
Thread Process
Thread is a single unit of execution and is Process is a program in execution and
part of process contains one or more threads.
A thread does not have its own data Process has its own code memory, data
memory and heap memory. It shares the memory and stack memory
data memory and heap memory with other
threads of the same process.
A thread cannot live independently; it lives A process contains at least one thread
within the process
Threads are very inexpensive to create Processes are very expensive to create.
Involves many OS overhead
Context switching is inexpensive and fast Context switching is complex and involves
lot of OS overhead and is comparatively
slower.
If a thread expires, its stack is reclaimed by If a process dies, the resources allocated to
process it are reclaimed by the OS and all the
associated threads of the process also dies

29
Tasks and Processes

30
Multirate Systems
🡺In multirate systems, certain operation must be
executed periodically and each operation is
executed at its own rate

🡺Ex, Automobile engines, Printers, Cellphones

31
Multirate Systems
🡺 Timing Requirements on processes

🡺 CPU Usage Metrics

🡺 Process state and Scheduling

🡺 Running Periodic processes

32
Pre-emptive real-time operating systems
• Preemptive real time operation system solves
the fundamental problems of cooperative
multitasking system
• A RTOS executes processes based upon timing
constraints provided by the system designer.
• The most reliable way to meet timing
constraints accurately is to build a preemptive
OS and to use priorities to control what
process runs at any given time

33
Preemptive real-time operating systems

• Two Important Methods


– Preemption

– Priorities

• Process and Context

• Processes and Object Oriented Design

34
Preemption
• Pre-emption is an alternative to the C function
call as a way to control execution

• Creating new routines that allow us to jump


from one subroutine to another at any point
in the program

35
Pre-emption
• The kernel is the part of the OSthat
determines what process is running

Length of the timer


period is known as
Time Quantum

36
Context Switching
• The set of registers that defines a process is known
as context
• Theswitching from one process’s register set
to another is known as context switching
• The data structure that holds the state of process is
known as record

37
Process Priorities
• Each process is assigned with the numerical priority
• Kernel simply look at the processes and their
priorities and select the highest priority process that
is ready to run

38
Process and
Context

39
Process and Object oriented design
• UML often refers to processes as active objects, that
is, objects that have independent threads of control.
• The class that defines an active object is known as an
active class.
• It has all the normal characteristics of a class,
including a name, attributes and operations.
• It also provides a set of signals that can be used to
communicate with the process.

40
Priority Based Scheduling
• Round-Robin Scheduling

• Process Priorities

• Rate Monotonic Scheduling

• Earliest Deadline first scheduling

• Shared Resources

• Priority Inversion
41
Round-Robin Scheduling
• Round Robin is the pre-emptive process
scheduling algorithm.
• Each process is provided a fix time to execute,
it is called a quantum.
• Once a process is executed for a given time
period, it is preempted and other process
executes for a given time period.
• Context switching is used to save states of
preempted processes.

42
Round-Robin Scheduling

43
Round-Robin Scheduling

44
Process Priorities
• Priority scheduling is a
algorithm and one non-preemptive of
scheduling algorithms the
in batch most
systems.
common
•Each process is assigned a priority. Process
with highest priority is to be executed first and
so on.
•Processes with same priority are executed on
first come first served basis.
•Priority can be decided based on memory
requirements, time requirements or any other
46
resource requirement. 45
Rate Monotonic scheduling
• Rate-monotonic scheduling (RMS), introduced
by Liu and Layland
• It is one of the first scheduling policies
developed for real-time systems and is still
very widely used
• Rate Monotonic Scheduling (RMS) assigns task
priorities in the order of the highest task
frequencies, i.e. the shortest periodic task
gets the highest priority, then the next with
the shortest period get the second highest
priority, and so on.
46
Rate Monotonic scheduling
• This model uses a relatively simple model of
the system
– All processes run periodically on a single CPU.
– Context switching time is ignored.
– There are no data dependencies between
processes.
– The execution time for a process is constant.
– All deadlines are at the ends of their periods.
– The highest-priority ready process is always
selected for execution.
47
Rate Monotonic scheduling

48
Earliest Dead line first scheduling
• Earliest Deadline First (EDF) is a dynamic
priority algorithm
• The priority of a job is inversely proportional
to its absolute deadline;
• In other words, the highest priority job is the
one with the earliest deadline;

49
Earliest Dead line first scheduling
• Example

Execution Time Period


T1 1 4
T2 2 6
T3 3 8

50
Earliest Dead line first scheduling
Execution Time Period
T1 1 4
T2 2 6
T3 3 8

• Observe that at time 6, even if the deadline of task 3 is very close, the
scheduler decides to schedule task 2.
• This is the main reason why T3 misses its deadline.

51
Earliest Dead line first scheduling

• Observe that at time 6, the problem does not


appear, as the earliest deadline job is
executed.
52
RMS vs EDF
RMS EDF
Achieves lower CPU Higher CPU utilization
utilization
easier to ensure that all Hard to ensure deadlines
deadlines will be satisfied
Static priority scheduling dynamic priority scheduling

Not expensive to use in Expensive to use in practice


practice
shortest-period process gets Process closest to its deadline
highest priority has highest priority.
53
Shared Resources
•While dealing with shared resources,
special care must be taken

•Race Condition

•Critical Sections

•Semaphores

54
Shared Resources

• Race Condition
– Consider the case in which an I/O device has a flag
that must be tested and modified by a process

– Problems may arise when other processes may


also want to access the device

– If combinations of events from the two task


operate on the device in the wrong order, we may
create a critical timing race or race condition

55
Shared Resources
• Critical Sections
– To prevent the race condition problems, we need to
control the order in which some operations occur
– We need to be sure that a task finishes an I/O
operations before allowing another task to starts its
own operation on that I/O device
– This is achieved by enclosing sensitive sections of code
in a critical section that executes without
interruption

56
Shared Resources

• Semaphores
– We create a critical section using semaphores,
which are primitive provided by the OS
– The semaphore is used to guard a resource
– we start a critical section by calling a semaphore
function that does no return until the resource is
available
– When we are done with the resource we use
anotherP();
semaphore
//wait forfunction
semaphore to release it
//do protected work here
V(); //release semaphore
57
Priority Inversion

• A low priority process blocks execution of a


higher priority process by keeping hold of its
resource. This is Priority Inversion.
• This priority inversion is dealt with Priority
Inheritance
• In priority inheritance,
– Promotes the priority of the process temporarily
– The priority of the process becomes higher than that
of any other process that may use the resource.
– Once the process is finished with the resource, its
priority is demoted to its normal value.
58
INTERPROCESS COMMUNICATION MECHANISMS

• Inter-process communication mechanisms are


provided by the operating system as part of
the process abstraction.
• Two ways of communication
– Blocking Communication
• The process goes into waiting state until it receives a
response
– Non Blocking Communication
• It allows the process to continue execution after
sending the communication

59
INTERPROCESS COMMUNICATION MECHANISMS

• Four major styles of inter-process communication


– Shared Memory

– Message passing

– Signals

– Mailboxes

60
Shared Memory

• CPU and I/O device communicate through a


shared memory location

61
EX: Shared Memory Communication
(Elastic buffers as shared memory)
The text compressor uses the CPU to compress
incoming text, which is then sent on a serial
line by a UART.

The input data arrive at a constant rate and are


easy to manage. But because the output data are
consumed at a variable rate, these data require an
elastic buffer.
The CPU and output UART share a memory
area—the CPU writes compressed characters into
the buffer and the UART removes them as
necessary to fill the serial line.
Because the number of bits in the buffer changes
constantly, the compression and transmission
processes need additional size information.
In this case, coordination is simple—the CPU writes
at one end of the buffer and the UART reads at the
other end. The only challenge is to make sure that
the UART does not overrun the buffer.
62
Race condition in shared memory
• Problem when two CPUs try to write the
same location:
– CPU 1 reads flag and sees 0.
– CPU 2 reads flag and sees 0.
– CPU 1 sets flag to one and writes location.
– CPU 2 sets flag to one and overwrites
location.
Atomic test-and-set
• Problem can be solved with an atomic
test-and-set:
– single bus operation reads memory location, tests
it, writes it.
• ARM test-and-set provided by SWP:

A test-and-set can be used to implement a semaphore (used to


guard access to a block of protected memory), which is a
language-level synchronization construct.
Critical regions
• Critical region: section of code that cannot
be interrupted by another process.
• Examples:
– writing shared memory;
– accessing I/O device.
Semaphores
• Semaphore: OS primitive for controlling
access to critical regions.
• Protocol:
– Get access to semaphore with P().
– Perform critical region operations.
– Release semaphore with V().
Message passing
•Each communicating entity has its own
message send/receive unit
•The message is stored in the senders/receivers
endpoints

67
Message passing
•For example, a home control system has one
microcontroller per household device – lamp, fan,
and so on.
•The device must communicate relatively
infrequently.
•Their physical separation is large enough that we
would not naturally have sharing a central pool of
memory.
•Passing communication packets among the device is
a natural implementation of communication in
many 8 bit controllers.
68
Signals
•Another form of inter-process communication
commonly used in Unix is Signal.
•A signal is analogous to an interrupt, but it is entirely
a software creation.
•A signal is generated by a process and transmitted to
another process by Operating System.

69
Mailboxes

• It is a asynchronous communication.
• Mailboxes have a fixed number of bits and can
be used for small messages.
• We can also implement a mailbox using P()
and V() using main memory for the mailbox
storage.
• Mail box should contain two items:
– Message
– Mail ready Flag

70
Mailboxes

void post(message *msg)


{
P(mailbox.sem); //wait for the mailbox
copy(mailbox.data.msg);
mailbox.flag =TRUE;
V(mailbox.sem) //release the mailbox
}

71
Mailboxes

Boolean pickup(message *msg)


{
boolean pickup =False
P(mailbox.sem); //wait for the mailbox
pickup=mailbox.flag;
mailbox.flag =FALSE;
copy(msg.mailbox.data);
V(mailbox.sem) //release the mailbox
return (pickup)
}
72
Distributed Embedded
Systems

73
Distributed Embedded
Systems

74
Distributed Embedded
Systems
• Embedded system Architecture

• CAN Bus

• I2C Bus

• Ethernet

• Internet

75
Embedded system
Architecture

76
Embedded system
Architecture
• Network
Abstraction

77
CAN Bus
• CAN is Control Area Network
• example - a network of embedded systems in
automobile
• A automobile uses a number of distributed embedded
controllers
• The controllers provide the controls for brakes,
engines, electrical power, lamps, temperature, air
conditioning, car gate, front display panel, meters
display panels and cruising

78
CAN Bus

79
CAN Bus

80
CAN Bus

81
CAN Protocol defined frame bits
• First field of 12 bits ─'arbitration field.
– 11-bit destination address and RTR bit (Remote
Transmission Request)
– Destination device address specified in an 11-bit
sub-field and whether the data byte being sent is
a data for the device or a request to the device in
1-bit sub-field.
11
– Maximum 2 devices can connect a CAN
controller in case of 11-bit address field standard
– If RTR 🡺 1, the packet is for the device at
destination address
– If RTR🡺 0, the packet is a request for the data 82
CAN Protocol defined frame bits
• Second field of 6 bits─ control field.
– The first bit is for the identifier’s extension.
– The second bit is always '1'.
– The last 4 bits specify code for data length
• Third field of 0 to 64 bits
– Its length depends on the data length code in the
control field.
• Fourth field of 16 bits─ CRC (Cyclic Redundancy
Check) bits.
– The receiver node uses it to detect the errors, if any,
during the transmission.
83
CAN Protocol defined frame bits
• Fifth field of 2 bits
• First bit 'ACK slot'
– ACK = '1' and receiver sends back '0' in this slot
when the receiver detects an error in the
reception. Sender after sensing '0' in the ACK slot,
generally retransmits the data frame.
• Second bit ‘Ack delimiter’
– Second bit 'ACK delimiter' bit. It signals the end of
ACK field. If the transmitting node does not
receive any acknowledgement of data frame
within a specified time slot, it should retransmit.
84
CAN Protocol defined frame bits
• Sixth field of 7-bits
– end- of- the frame specification and has seven '0's
• Interframe Bits
– Minimum 3 bits separate two CAN frames

85
I 2C
• The name stands for “Inter - Integrated Circuit
Bus”
• A Small Area Network connecting ICs and
other electronic systems
• Developed by Philips Semiconductors
• I2C can support up to 128 slave Devices
• Today, a variety of devices are available with
I2C Interfaces
– Microcontroller, EEPROM, Real-Timer, interface
chips, LCD driver, A/D converter
86
2
IC

87
2
IC
• Used for moving data simply and quickly from one
device to another
• Low cost, easy to implement and of moderate speed
• Serial Interface
• I2C is a synchronous protocol that allows a master
device to initiate communication with a slave device.
• I2C is also bi-directional by which data is sent either
direction on the serial data line (SDA) by the master
or slave.

88
2
IC
• I2C is a Master-Slave protocol that allows
– The Master device controls the clock (SCL)
– The slave devices may hold the clock low to
prevent data transfer

– No data is transferred unless a clock signal is


present

– All slaves are controlled by the master clock

89
2
I
C
• SDA
– This signal is known as Serial Data. Any data sent
from one device to another goes on this line

• SCL
– This is the Serial Clock signal. It is generated by the
master device and controls when data is sent and
when it is read.

90
2
I C distance
• Synchronous Serial Communication
– 400 kbps up to 2 m and

– 100 kbps for longer distances

91
I2C Bus operation
• SDA line Transmits/Receives data bits (MSB is sent
first)
• Data in SDA line is stable during clock (SCL) high
• Serial clock is driven by the master
• Acknowledgment bit is driven by the receiver after
the end of reception
• If the receiver does not acknowledge, SDA line
remains high ACK driven by
MSB LSB receiver
SDA

SCL
92
I2C Bus operation
• The format of I2C bus
transfers

93
Start Bit
• Initializes I2C Bus
• SDA is pulled low, then SCL is pulled
low

94
Stop Bit
• Releases I2C Bus
• SCL is released first, then SDA is released

95
Control bits
• The start bit is followed 7 bit slave address
• Address bit is followed by R/W bit
• If R/W=0, subsequent bytes transmitted on
the bus will be written by the controller to the
selected peripheral
• If R/W=1, subsequent bytes will be sent by the
selected peripheral and read by the controller

Slave Address
R/W
96
I2C typical message format

97
I2C typical message format
• For peripheral chip that contains more than
one internal register or memory address, the
PIC will typically write a second byte to the
chip to set a pointer to the selected internal
register or the consecutive addresses that
follow it

98
I2C typical message format

99
I2C Advantages
• It is faster than asynchronous serial communication
• Number of pins required for communication is less
(only 2 pins)

• I2C supports multi master system

• I2C supports up to 128 slave devices

• I2C supports slave acknowledgment

100
I2C Disadvantages
• Communication is more complex than UART or
SPI
• I2C draws more power than other serial
communication
• Slower operational devices can slower the
operations of faster speed devices

• I2C bus can result in entire bus hanging


• I2C bus does not suits long range

101
Ethernet
• Very widely used as a local area network for
general purpose computing.

• Particularly useful when PCs are used as


platforms

102
Ethernet
• Ethernet are not synchronized (They can send their bits at
any time)
• If two nodes decide to transmit at the same time, the
message will be ruined

• Carrier Sense Multiple Access with Collision Detection


• A node that has message waits for the bus to become silent
and then starts transmitting
• If it hears another transmission that interferes with its
transmission, it stops transmitting and wait to retransmit

• The waiting time is random


103
Ethernet CSMA/CD algorithm

104
Ethernet Packet format

start source dest data


preamble length padding CRC
frame adrs adrs payload

105
Disadvantages
• Ethernet was not designed to support real time
operations
• The exponential backoff scheme cannot guarantee
delivery time of any data
• Three ways to reduce the variance in Ethernet’s
packet delivery time

– Suppress collisions on the network


– Reduce the number of collisions
– Resolve collisions deterministically
106
Internet
• Connectionless communication

• Packet based communication


• Internet packet will travel over several
different networks from source to destination

• IP allows data to flow seamlessly through


these networks from one end user to another

107
Protocol utilization in Internet Communication

A node that transmits data among


different types of networks is
known as a router

108
IP Packet structure

The maximum
total length of
the header and
data payload is
65,535 bytes

109
IP Packet structure
• Internet address is 32 bits in early version

• 128 bits in IPv6

• Typically written as xxx.xx.xx.xx

110
Internet service stack
Simple Network
Management Protocol.
User Datagram Protocol

File Transport Protocol


Hypertext Transport
Protocol (HTTP) Simple
Mail Transfer Protocol
Transmission
Control Protocol (TCP)

111
MPSoCs and Shared memory
multiprocessors
• MPSoC 🡺 Multi Processor System on Chip
• Shared memory processors 🡺 well suited for large amount of
data to be processed

• Most MPSoCs are shared memory systems

• Two types of MPSoCs


– Symmetric
– Hetrogeneous
• Heterogeneous multiprocessors use less energy, are less
expensive
112
TI TMS320DM816x DaVinci Board

113
Accelerated systems
• Use additional computational unit dedicated
to some functions

– Hardwired logic
– Extra CPU
• Hardware/software co-design: joint design of
hardware and software architectures

114
Accelerated system design
• First, determine whether the system really
needs to be accelerated????
– How much faster is the accelerator on the core
function?

• Design the accelerator itself


• Design CPU interface to accelerator

115
Accelerated system architecture

116
Accelerator/CPU interface

• Accelerator registers provide control


registers for CPU
• Data registers can be used for small data
objects
• Accelerator may include special-purpose
read/write logic
– Especially valuable for large data transfers

117
Accelerator implementations

• Application-specific integrated circuit(ASIC)

• Field-programmable gate array (FPGA)

• Standard component
– Example: graphics processor

118
Xilinx Zynq-7000
• AMBA bus connects to CPUs and FPGA fabric

119
Accelerator Performance Analysis
• Critical parameter is speedup: how much
faster is the system with the accelerator?

• Must take into account:


– Accelerator execution time
– Data transfer time
– Synchronization with the master CPU

120
Accelerator execution time

• Total accelerator execution time:


t =t +t +t
accel in x out

Data input Data output


Accelerated
computation

121
Accelerator speedup

• Assume loop is executed n times.


• Compare accelerated system to non-
accelerated system:
• The Speedup S for a kernel can be written as:
S = n(tCPU - taccel)
= n[tCPU - (tin + tx + tout)]

Execution time on CPU

122
Single- vs. multi-threaded

• One critical factor is available parallelism:


– single-threaded/blocking: CPU waits for accelerator;
– multithreaded/non-blocking: CPU continues to
execute along with accelerator.

• To multithread, CPU must have useful work to do.


– But software must also support multithreading.

123
Total execution time

• Single-threaded: • Multi-threaded:
P1
P1

P2 A1
P2 A1

P3
P3

P4
P4

124
Total execution time

125
Design Example – Engine Control Unit

126
Engine Control Unit-Theory of operation

⌘Throttle is command input.


⌘Engine measures throttle, RPM, intake air volume, etc.
⌘Controller computes injector pulse width and spark.
127
Requirements

128
Engine controller data periods

129
Specifications
Engine control calculations
⌘Compute initial values:
2.5 1
PW = *VS *
2* NE 10 − k1ΔT
S = k2 * ΔNE − k3VS

⌘Applies corrections for warm-up,


throttle opening, oxygen sensor,
battery voltage.

130
Engine controller class diagram

131
Throttle position sensing state
diagram

132
Injector pulse width and spark
advance angle

133
Component design and testing
⌘Processes at different periods must share state
variables.
Engine control is mode-dependent, requires multiple
test cases.
SAE has standards for coding practice, software
development, requirements, and
verification/validation.
System testing
Engine ignition system creates electrical noise that can
prevent unsheilded controller from operating.
Engine compartment runs at high temperature.
134
Design Example – Audio Player

135
Design Example – Audio Player
A media player which can play only audio files is often called
MP3 player.
Functions:
1.Audio Compression
2.Audio Decompression
3.Masking

Audio Decompression
• The incoming bit stream is encoded using Huffman code which
must be decoded. The audio data will then be applied to a
reconstruction filter.

Audio Compression
•A lossy process that relies on perceptual coding to encode to
fewer bits. The coder eliminates certain features of the audio
stream. 136
Masking
One tone can be masked by another if the tones are sufficiently
close in frequency / time.

Standards
MP3 comes from MPEG-1 Layer 3
The MP3 standard defines 3 layers of audio compression
1.Layer 1 (MP1) – lossless compression of subbands and an
optional simple masking model
2.Layer 2 (MP2)- Uses a more advanced masking model
3.Layer 3 (MP3) - performs additional processing to provide lower
bit rates

137
MPEG Layer 1 encoder
• Masking Model
It selects the scale factors.
It is driven by a separate FFT
The filter bank could be used for masking
– Multiplexing
The multiplexer at the output of the encoder passes
along all the required data.
MPEG data streams are divided into frames.
Header CRC Bit Allocation Scale Sub band AUX Data
factors samples

Data Frame Format 138


MPEG Layer 1 encoder

• Filter Bank
It splits the signal into a set of 32 sub bands that are equally spaced in
the frequency domain and cover the entire frequency range of the
audio.
• Encoder
Audio signals tend to be more correlated within a narrow band. So
splitting into sub bands reduce the bit rate.
• Quantizer
It scales each sub band such that it fits within 6 bits of dynamic range,
then quantizes based upon the current scale factor for that sub
band. 139
MPEG Layer 1 Decoder

• A straight forward process


• After disassembling the data frame, the data are unscaled and inverse
quantized to produce sample streams for the sub band
• An inverse filter bank reassembles the sub bands into uncompressed signal
• User interface
– Simple display and a few buttons
File System
– CD/MP3 players use compact discs
– Today’s players – plugged into USB ports and treated as disk drivers140on
REQUIREMENTS
Category Description
Name Audio player.
Purpose Play audio from file.
Inputs Flash socket, on/off, play/stop, menu
up/down.
Outputs Speaker
Functions Display list of files in memory, select
file to play, play file.
Performance Sufficient for audio playback.
Power 1 AAA battery.
Physical weight/size Approx. 1” x 2”, less than 2 oz.

141
SPECIFICATION

142
State diagram for File Display /
Selection

143
State Diagram for Audio Playback

144
System Architecture

• The Cirrus CS7410 is an audio controller


designed for CD/MP3 players.
• 32 bit RISC processor to perform System
controller and audio decoding
• 16 bit DSP to perform audio effects such as
equalization
• Memory controller- Flash, DRAM, SRAM

145
System Architecture

146
Component Design and Testing
• The audio decompression object can be implemented
from existing code or created as new software.
• In case of an audio system that does not conform to a
standard, an audio compression program must be
created to create test files
• The file system and the user interface can be tested
independently of the audio decompression system.
• The audio output system should be tested separately
from the compression system
• Testing of audio decompression requires sample
audio files 147
System Integration and Debugging
• This ensures audio to play smoothly and
without interruption
• Any file access and output operations are
tested separately using Recognizable test
signal

148
Design Example – Video Accelerator

149
VIDEO ACCELERATOR
A hardware circuit on display
adapter that speeds up full
motion video.
Functions
✔Color space conversion
-Converts YUV to RGB
✔Hardware scaling
-To enlarge the image to
full screen
✔Double buffering

150
Video Compression
• MPEG-2 Compression Algorithm is the best
algorithm for Video Accelerator
• MPEG-2 forms basis for U.S HDTV broadcasting
• Discrete Cosine Transform(DCT) plays key role in
MPEG-2
• In image compression, the DCT of a block of
pixels is quantized and subjected to lossless
variable length coding to reduce the number of
bits required to represent the block

151
Block diagram of MPEG-2
Compression Algorithm

152
Block Motion Estimation
• JPEG style compression alone does not reduce video
bandwidth enough for many applications
• Hence MPEG uses MOTION to encode one frame in terms of
another
• In motion JPEG, instead of sending each frame separately,
some frames are sent as modified forms of other frames
using this technique
• During encoding, the frame is divided into macro blocks.
• Macro blocks from one frame are identified in other frames
using correlation
• The frame can be encoded using the vector
• Without explicitly transmitting all the pixels that vector
describes the motion of the macro block from one frame to
another 153
Encoder
• It uses a feedback loop to improve image
quality
• It uses the encoding information to recreate the
lossily encoded picture, compares it to the
original frame, generates an error signal to the
receiver to fix smaller errors
Decoder
• This keeps some recently decoded frames in
memory to retrieve the pixel values of macro
blocks
154
Concept of Black Motion Estimation
• Main goal is to perform a two dimensional correlation to find the
best match between regions in the two frames
• The current frame is divided into 16X16 macro blocks
• For every macro block, the region in the previous frame
that closely matches the macro block is found
• This is done at various offsets in the search area

155
Block Motion Estimation

156
Intensity
An 8 bit luminance that
represents a
monochrome pixel
Motion vector
• The macro block
position is chosen
relative to the search
area that gives the
smallest value this
for metric.
• The offset at this chosen
position describing a
vector from the search
area center to the
macro blocks center is
called Motion Vector
157
Requirements
name Block motion estimator Block
purpose motion est. in PC Macroblocks,
inputs Search areas Motion vectors
outputs functions
Compute motion vectors with full
search as fast as possible
performance hundreds of dollars
manufacturing cost from PC power supply
power PCI card
physical size/weight

158
Specifications
There are three classes used to describe data
types
1.Motion Vector class
2.Macro block Class
3.Search area class
Motion-vector Macroblock Search-area
x, y : pos pixels[] : pixelval pixels[] : pixelval

PC Motion-estimator
memory[]
compute-mv()
Basic Classes for the video accelerator
159
Architecture

160
• Two memories
– Onefor macro block
– Another for search memories
• 16 processing elements(PE)
- perform the difference calculation on a chain of
pixels
• Comparator
- sums all these 16 PE and selects the best value to
find the motion vector
The schedule fetches one pixel from macro block
memory and two pixels from the search area
memory per clock cycle
These pixels are distributed to the processing elements 161
• This schedule computes 16
between the correlations and
macroblock search area
simultaneously
• The computations for each correlation are
distributed among the processing elements

• The comparator collects the results, finds the


best match value remembering the
corresponding motion vector

162
Object Diagram for Video
Accelerator

163
Component Design
• The accelerator board will have its own driver that is
responsible for talking to the board
• Since most of the data transfers are performed directly by the
board using DMA, the driver can be relatively simple.

System Testing
• Testing video algorithms requires a large amount of data.
• It is easy to use images not video for test data.

164

You might also like