Unit 3
Unit 3
🡺 Process
🡺 Multiprogramming
🡺 Multitasking
🡺 Multiprocessing
🡺 Multithreading
19
1
9
Process
🡺A single execution of a program is called as
Process.
🡺If we run the same program two different times,
we have created two different processes.
🡺Each process has its own state that includes not
only its registers but all of its memory.
🡺In some OSs, the memory management unit is used
to keep each process in a separate address space.
🡺In others, particularly lightweight RTOSs, the
processes run in the same address space.
🡺 Processes that
often called share the same address space are 204
threads
Multiprogramming
22
Multitasking
23
Multitasking
🡺 Multitasking is ability of an
operating
the system to execute morethan one
simultaneously on a single processor machine.
task
🡺Though we say so but in reality no two tasks on
a single processor machine can be executed at
the same time.
🡺Actually CPU switches from one task to the next
task so quickly that appears as if all the tasks
are executing at the same time.
24
Multitasking System
25
Multiprocessing
🡺Multiprocessing is the ability of an operating
system to execute more than one process
simultaneously on a multi processor machine.
🡺In this, a computer uses more than one CPU at a
time.
26
Multithread
🡺 Threads are the light wait processes which are independent part of a
process or program
🡺 Processes that share the same address space are often called threads
27
Multithread
🡺Multithreading is the ability of an operating
system to execute the different parts of a
program called threads at the same time.
29
Tasks and Processes
30
Multirate Systems
🡺In multirate systems, certain operation must be
executed periodically and each operation is
executed at its own rate
31
Multirate Systems
🡺 Timing Requirements on processes
32
Pre-emptive real-time operating systems
• Preemptive real time operation system solves
the fundamental problems of cooperative
multitasking system
• A RTOS executes processes based upon timing
constraints provided by the system designer.
• The most reliable way to meet timing
constraints accurately is to build a preemptive
OS and to use priorities to control what
process runs at any given time
33
Preemptive real-time operating systems
– Priorities
34
Preemption
• Pre-emption is an alternative to the C function
call as a way to control execution
35
Pre-emption
• The kernel is the part of the OSthat
determines what process is running
36
Context Switching
• The set of registers that defines a process is known
as context
• Theswitching from one process’s register set
to another is known as context switching
• The data structure that holds the state of process is
known as record
37
Process Priorities
• Each process is assigned with the numerical priority
• Kernel simply look at the processes and their
priorities and select the highest priority process that
is ready to run
38
Process and
Context
39
Process and Object oriented design
• UML often refers to processes as active objects, that
is, objects that have independent threads of control.
• The class that defines an active object is known as an
active class.
• It has all the normal characteristics of a class,
including a name, attributes and operations.
• It also provides a set of signals that can be used to
communicate with the process.
40
Priority Based Scheduling
• Round-Robin Scheduling
• Process Priorities
• Shared Resources
• Priority Inversion
41
Round-Robin Scheduling
• Round Robin is the pre-emptive process
scheduling algorithm.
• Each process is provided a fix time to execute,
it is called a quantum.
• Once a process is executed for a given time
period, it is preempted and other process
executes for a given time period.
• Context switching is used to save states of
preempted processes.
42
Round-Robin Scheduling
43
Round-Robin Scheduling
44
Process Priorities
• Priority scheduling is a
algorithm and one non-preemptive of
scheduling algorithms the
in batch most
systems.
common
•Each process is assigned a priority. Process
with highest priority is to be executed first and
so on.
•Processes with same priority are executed on
first come first served basis.
•Priority can be decided based on memory
requirements, time requirements or any other
46
resource requirement. 45
Rate Monotonic scheduling
• Rate-monotonic scheduling (RMS), introduced
by Liu and Layland
• It is one of the first scheduling policies
developed for real-time systems and is still
very widely used
• Rate Monotonic Scheduling (RMS) assigns task
priorities in the order of the highest task
frequencies, i.e. the shortest periodic task
gets the highest priority, then the next with
the shortest period get the second highest
priority, and so on.
46
Rate Monotonic scheduling
• This model uses a relatively simple model of
the system
– All processes run periodically on a single CPU.
– Context switching time is ignored.
– There are no data dependencies between
processes.
– The execution time for a process is constant.
– All deadlines are at the ends of their periods.
– The highest-priority ready process is always
selected for execution.
47
Rate Monotonic scheduling
48
Earliest Dead line first scheduling
• Earliest Deadline First (EDF) is a dynamic
priority algorithm
• The priority of a job is inversely proportional
to its absolute deadline;
• In other words, the highest priority job is the
one with the earliest deadline;
49
Earliest Dead line first scheduling
• Example
50
Earliest Dead line first scheduling
Execution Time Period
T1 1 4
T2 2 6
T3 3 8
• Observe that at time 6, even if the deadline of task 3 is very close, the
scheduler decides to schedule task 2.
• This is the main reason why T3 misses its deadline.
51
Earliest Dead line first scheduling
•Race Condition
•Critical Sections
•Semaphores
54
Shared Resources
• Race Condition
– Consider the case in which an I/O device has a flag
that must be tested and modified by a process
55
Shared Resources
• Critical Sections
– To prevent the race condition problems, we need to
control the order in which some operations occur
– We need to be sure that a task finishes an I/O
operations before allowing another task to starts its
own operation on that I/O device
– This is achieved by enclosing sensitive sections of code
in a critical section that executes without
interruption
56
Shared Resources
• Semaphores
– We create a critical section using semaphores,
which are primitive provided by the OS
– The semaphore is used to guard a resource
– we start a critical section by calling a semaphore
function that does no return until the resource is
available
– When we are done with the resource we use
anotherP();
semaphore
//wait forfunction
semaphore to release it
//do protected work here
V(); //release semaphore
57
Priority Inversion
59
INTERPROCESS COMMUNICATION MECHANISMS
– Message passing
– Signals
– Mailboxes
60
Shared Memory
61
EX: Shared Memory Communication
(Elastic buffers as shared memory)
The text compressor uses the CPU to compress
incoming text, which is then sent on a serial
line by a UART.
67
Message passing
•For example, a home control system has one
microcontroller per household device – lamp, fan,
and so on.
•The device must communicate relatively
infrequently.
•Their physical separation is large enough that we
would not naturally have sharing a central pool of
memory.
•Passing communication packets among the device is
a natural implementation of communication in
many 8 bit controllers.
68
Signals
•Another form of inter-process communication
commonly used in Unix is Signal.
•A signal is analogous to an interrupt, but it is entirely
a software creation.
•A signal is generated by a process and transmitted to
another process by Operating System.
69
Mailboxes
• It is a asynchronous communication.
• Mailboxes have a fixed number of bits and can
be used for small messages.
• We can also implement a mailbox using P()
and V() using main memory for the mailbox
storage.
• Mail box should contain two items:
– Message
– Mail ready Flag
70
Mailboxes
71
Mailboxes
73
Distributed Embedded
Systems
74
Distributed Embedded
Systems
• Embedded system Architecture
• CAN Bus
• I2C Bus
• Ethernet
• Internet
75
Embedded system
Architecture
76
Embedded system
Architecture
• Network
Abstraction
77
CAN Bus
• CAN is Control Area Network
• example - a network of embedded systems in
automobile
• A automobile uses a number of distributed embedded
controllers
• The controllers provide the controls for brakes,
engines, electrical power, lamps, temperature, air
conditioning, car gate, front display panel, meters
display panels and cruising
78
CAN Bus
79
CAN Bus
80
CAN Bus
81
CAN Protocol defined frame bits
• First field of 12 bits ─'arbitration field.
– 11-bit destination address and RTR bit (Remote
Transmission Request)
– Destination device address specified in an 11-bit
sub-field and whether the data byte being sent is
a data for the device or a request to the device in
1-bit sub-field.
11
– Maximum 2 devices can connect a CAN
controller in case of 11-bit address field standard
– If RTR 🡺 1, the packet is for the device at
destination address
– If RTR🡺 0, the packet is a request for the data 82
CAN Protocol defined frame bits
• Second field of 6 bits─ control field.
– The first bit is for the identifier’s extension.
– The second bit is always '1'.
– The last 4 bits specify code for data length
• Third field of 0 to 64 bits
– Its length depends on the data length code in the
control field.
• Fourth field of 16 bits─ CRC (Cyclic Redundancy
Check) bits.
– The receiver node uses it to detect the errors, if any,
during the transmission.
83
CAN Protocol defined frame bits
• Fifth field of 2 bits
• First bit 'ACK slot'
– ACK = '1' and receiver sends back '0' in this slot
when the receiver detects an error in the
reception. Sender after sensing '0' in the ACK slot,
generally retransmits the data frame.
• Second bit ‘Ack delimiter’
– Second bit 'ACK delimiter' bit. It signals the end of
ACK field. If the transmitting node does not
receive any acknowledgement of data frame
within a specified time slot, it should retransmit.
84
CAN Protocol defined frame bits
• Sixth field of 7-bits
– end- of- the frame specification and has seven '0's
• Interframe Bits
– Minimum 3 bits separate two CAN frames
85
I 2C
• The name stands for “Inter - Integrated Circuit
Bus”
• A Small Area Network connecting ICs and
other electronic systems
• Developed by Philips Semiconductors
• I2C can support up to 128 slave Devices
• Today, a variety of devices are available with
I2C Interfaces
– Microcontroller, EEPROM, Real-Timer, interface
chips, LCD driver, A/D converter
86
2
IC
87
2
IC
• Used for moving data simply and quickly from one
device to another
• Low cost, easy to implement and of moderate speed
• Serial Interface
• I2C is a synchronous protocol that allows a master
device to initiate communication with a slave device.
• I2C is also bi-directional by which data is sent either
direction on the serial data line (SDA) by the master
or slave.
88
2
IC
• I2C is a Master-Slave protocol that allows
– The Master device controls the clock (SCL)
– The slave devices may hold the clock low to
prevent data transfer
89
2
I
C
• SDA
– This signal is known as Serial Data. Any data sent
from one device to another goes on this line
• SCL
– This is the Serial Clock signal. It is generated by the
master device and controls when data is sent and
when it is read.
90
2
I C distance
• Synchronous Serial Communication
– 400 kbps up to 2 m and
91
I2C Bus operation
• SDA line Transmits/Receives data bits (MSB is sent
first)
• Data in SDA line is stable during clock (SCL) high
• Serial clock is driven by the master
• Acknowledgment bit is driven by the receiver after
the end of reception
• If the receiver does not acknowledge, SDA line
remains high ACK driven by
MSB LSB receiver
SDA
SCL
92
I2C Bus operation
• The format of I2C bus
transfers
93
Start Bit
• Initializes I2C Bus
• SDA is pulled low, then SCL is pulled
low
94
Stop Bit
• Releases I2C Bus
• SCL is released first, then SDA is released
95
Control bits
• The start bit is followed 7 bit slave address
• Address bit is followed by R/W bit
• If R/W=0, subsequent bytes transmitted on
the bus will be written by the controller to the
selected peripheral
• If R/W=1, subsequent bytes will be sent by the
selected peripheral and read by the controller
Slave Address
R/W
96
I2C typical message format
97
I2C typical message format
• For peripheral chip that contains more than
one internal register or memory address, the
PIC will typically write a second byte to the
chip to set a pointer to the selected internal
register or the consecutive addresses that
follow it
98
I2C typical message format
99
I2C Advantages
• It is faster than asynchronous serial communication
• Number of pins required for communication is less
(only 2 pins)
100
I2C Disadvantages
• Communication is more complex than UART or
SPI
• I2C draws more power than other serial
communication
• Slower operational devices can slower the
operations of faster speed devices
101
Ethernet
• Very widely used as a local area network for
general purpose computing.
102
Ethernet
• Ethernet are not synchronized (They can send their bits at
any time)
• If two nodes decide to transmit at the same time, the
message will be ruined
104
Ethernet Packet format
105
Disadvantages
• Ethernet was not designed to support real time
operations
• The exponential backoff scheme cannot guarantee
delivery time of any data
• Three ways to reduce the variance in Ethernet’s
packet delivery time
107
Protocol utilization in Internet Communication
108
IP Packet structure
The maximum
total length of
the header and
data payload is
65,535 bytes
109
IP Packet structure
• Internet address is 32 bits in early version
110
Internet service stack
Simple Network
Management Protocol.
User Datagram Protocol
111
MPSoCs and Shared memory
multiprocessors
• MPSoC 🡺 Multi Processor System on Chip
• Shared memory processors 🡺 well suited for large amount of
data to be processed
113
Accelerated systems
• Use additional computational unit dedicated
to some functions
– Hardwired logic
– Extra CPU
• Hardware/software co-design: joint design of
hardware and software architectures
114
Accelerated system design
• First, determine whether the system really
needs to be accelerated????
– How much faster is the accelerator on the core
function?
115
Accelerated system architecture
116
Accelerator/CPU interface
117
Accelerator implementations
• Standard component
– Example: graphics processor
118
Xilinx Zynq-7000
• AMBA bus connects to CPUs and FPGA fabric
119
Accelerator Performance Analysis
• Critical parameter is speedup: how much
faster is the system with the accelerator?
120
Accelerator execution time
121
Accelerator speedup
122
Single- vs. multi-threaded
123
Total execution time
• Single-threaded: • Multi-threaded:
P1
P1
P2 A1
P2 A1
P3
P3
P4
P4
124
Total execution time
125
Design Example – Engine Control Unit
126
Engine Control Unit-Theory of operation
128
Engine controller data periods
129
Specifications
Engine control calculations
⌘Compute initial values:
2.5 1
PW = *VS *
2* NE 10 − k1ΔT
S = k2 * ΔNE − k3VS
130
Engine controller class diagram
131
Throttle position sensing state
diagram
132
Injector pulse width and spark
advance angle
133
Component design and testing
⌘Processes at different periods must share state
variables.
Engine control is mode-dependent, requires multiple
test cases.
SAE has standards for coding practice, software
development, requirements, and
verification/validation.
System testing
Engine ignition system creates electrical noise that can
prevent unsheilded controller from operating.
Engine compartment runs at high temperature.
134
Design Example – Audio Player
135
Design Example – Audio Player
A media player which can play only audio files is often called
MP3 player.
Functions:
1.Audio Compression
2.Audio Decompression
3.Masking
Audio Decompression
• The incoming bit stream is encoded using Huffman code which
must be decoded. The audio data will then be applied to a
reconstruction filter.
Audio Compression
•A lossy process that relies on perceptual coding to encode to
fewer bits. The coder eliminates certain features of the audio
stream. 136
Masking
One tone can be masked by another if the tones are sufficiently
close in frequency / time.
Standards
MP3 comes from MPEG-1 Layer 3
The MP3 standard defines 3 layers of audio compression
1.Layer 1 (MP1) – lossless compression of subbands and an
optional simple masking model
2.Layer 2 (MP2)- Uses a more advanced masking model
3.Layer 3 (MP3) - performs additional processing to provide lower
bit rates
137
MPEG Layer 1 encoder
• Masking Model
It selects the scale factors.
It is driven by a separate FFT
The filter bank could be used for masking
– Multiplexing
The multiplexer at the output of the encoder passes
along all the required data.
MPEG data streams are divided into frames.
Header CRC Bit Allocation Scale Sub band AUX Data
factors samples
• Filter Bank
It splits the signal into a set of 32 sub bands that are equally spaced in
the frequency domain and cover the entire frequency range of the
audio.
• Encoder
Audio signals tend to be more correlated within a narrow band. So
splitting into sub bands reduce the bit rate.
• Quantizer
It scales each sub band such that it fits within 6 bits of dynamic range,
then quantizes based upon the current scale factor for that sub
band. 139
MPEG Layer 1 Decoder
141
SPECIFICATION
142
State diagram for File Display /
Selection
143
State Diagram for Audio Playback
144
System Architecture
145
System Architecture
146
Component Design and Testing
• The audio decompression object can be implemented
from existing code or created as new software.
• In case of an audio system that does not conform to a
standard, an audio compression program must be
created to create test files
• The file system and the user interface can be tested
independently of the audio decompression system.
• The audio output system should be tested separately
from the compression system
• Testing of audio decompression requires sample
audio files 147
System Integration and Debugging
• This ensures audio to play smoothly and
without interruption
• Any file access and output operations are
tested separately using Recognizable test
signal
148
Design Example – Video Accelerator
149
VIDEO ACCELERATOR
A hardware circuit on display
adapter that speeds up full
motion video.
Functions
✔Color space conversion
-Converts YUV to RGB
✔Hardware scaling
-To enlarge the image to
full screen
✔Double buffering
150
Video Compression
• MPEG-2 Compression Algorithm is the best
algorithm for Video Accelerator
• MPEG-2 forms basis for U.S HDTV broadcasting
• Discrete Cosine Transform(DCT) plays key role in
MPEG-2
• In image compression, the DCT of a block of
pixels is quantized and subjected to lossless
variable length coding to reduce the number of
bits required to represent the block
151
Block diagram of MPEG-2
Compression Algorithm
152
Block Motion Estimation
• JPEG style compression alone does not reduce video
bandwidth enough for many applications
• Hence MPEG uses MOTION to encode one frame in terms of
another
• In motion JPEG, instead of sending each frame separately,
some frames are sent as modified forms of other frames
using this technique
• During encoding, the frame is divided into macro blocks.
• Macro blocks from one frame are identified in other frames
using correlation
• The frame can be encoded using the vector
• Without explicitly transmitting all the pixels that vector
describes the motion of the macro block from one frame to
another 153
Encoder
• It uses a feedback loop to improve image
quality
• It uses the encoding information to recreate the
lossily encoded picture, compares it to the
original frame, generates an error signal to the
receiver to fix smaller errors
Decoder
• This keeps some recently decoded frames in
memory to retrieve the pixel values of macro
blocks
154
Concept of Black Motion Estimation
• Main goal is to perform a two dimensional correlation to find the
best match between regions in the two frames
• The current frame is divided into 16X16 macro blocks
• For every macro block, the region in the previous frame
that closely matches the macro block is found
• This is done at various offsets in the search area
155
Block Motion Estimation
156
Intensity
An 8 bit luminance that
represents a
monochrome pixel
Motion vector
• The macro block
position is chosen
relative to the search
area that gives the
smallest value this
for metric.
• The offset at this chosen
position describing a
vector from the search
area center to the
macro blocks center is
called Motion Vector
157
Requirements
name Block motion estimator Block
purpose motion est. in PC Macroblocks,
inputs Search areas Motion vectors
outputs functions
Compute motion vectors with full
search as fast as possible
performance hundreds of dollars
manufacturing cost from PC power supply
power PCI card
physical size/weight
158
Specifications
There are three classes used to describe data
types
1.Motion Vector class
2.Macro block Class
3.Search area class
Motion-vector Macroblock Search-area
x, y : pos pixels[] : pixelval pixels[] : pixelval
PC Motion-estimator
memory[]
compute-mv()
Basic Classes for the video accelerator
159
Architecture
160
• Two memories
– Onefor macro block
– Another for search memories
• 16 processing elements(PE)
- perform the difference calculation on a chain of
pixels
• Comparator
- sums all these 16 PE and selects the best value to
find the motion vector
The schedule fetches one pixel from macro block
memory and two pixels from the search area
memory per clock cycle
These pixels are distributed to the processing elements 161
• This schedule computes 16
between the correlations and
macroblock search area
simultaneously
• The computations for each correlation are
distributed among the processing elements
162
Object Diagram for Video
Accelerator
163
Component Design
• The accelerator board will have its own driver that is
responsible for talking to the board
• Since most of the data transfers are performed directly by the
board using DMA, the driver can be relatively simple.
System Testing
• Testing video algorithms requires a large amount of data.
• It is easy to use images not video for test data.
164