Week2 - 1
Week2 - 1
2
Magnetic Memory
• In core storage, each ferrite ring can represent a 0 or 1 bit, depending
on its magnetic state.
3
Semiconductor Memory
• Since 1970, semiconductor memory has been through 13 generations:
• 1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of this writing, 8 Gb on a
single chip (1k = 2^10 , 1M = 2^20 , 1G = 2^30).
• Just as the density of elements on memory chips has continued to rise, so has the density of
elements on processor chips.
• As time went on, more and more elements were placed on each chip,
• So that fewer and fewer chips were needed to construct a single computer processor
4
Intel
1971 – 4004
• First microprocessor
• All CPU components on a single chip
• 4 bit number addition
• Multiply number by repeated Addition
1972 - 8008
• 8 bit
• Both 4004 and 8008 designed for specific applications
1974 - 8080
• Intel’s first general purpose microprocessor
Dr. Syed Atif Ali Shah 5
Evolution of Intel Micro Processors
Evolution of Intel Micro Processors
Evolution of Intel Micro Processors
Evolution of Intel Micro Processors
Memory/Storage Architecture: Historical Trend
The gap between compute and memory/storage is increasing
Pipelining
Branch prediction
Superscalar execution
Data flow analysis
Speculative execution
Dr. Syed Atif Ali Shah 11
Pipelining
• Pipelining enables a processor to work simultaneously on multiple instructions.
• For example, while one instruction is being executed, the computer is decoding the next
instruction.
12
https://www.researchgate.net/publication/352189762_Advancements_in_Microprocessor_Architecture_for_Ubiquitous_AI-An_Overview_on_History_Evolution_and_Upcoming_Challenges_in_AI_Implementation
Pipelining Limitations
• The speed of a pipeline is eventually limited by the slowest stage.
• For this reason, conventional processors rely on very deep pipelines (20 stage pipelines).
• The penalty of a miss prediction grows with the depth of the pipeline, since a larger number of instructions will have to be
flushed.
13
https://www.researchgate.net/publication/352189762_Advancements_in_Microprocessor_Architecture_for_Ubiquitous_AI-An_Overview_on_History_Evolution_and_Upcoming_Challenges_in_AI_Implementation
Branch prediction
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• Dynamic Branch Prediction
14
Correlating Branch Prediction - GeeksforGeeks
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• Dynamic Branch Prediction Static BP: underlying hardware assumes that either the
branch is not taken always or the branch is taken always.
15
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
Dynamic BP: Prediction by underlying hardware is not fixed,
• Dynamic Branch Prediction rather it changes dynamically.
16
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• 1-bit branch prediction technique
• Dynamic Branch Prediction
17
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• 1-bit branch prediction technique
• Dynamic Branch Prediction
18
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• 2-bit branch prediction technique
• Static Branch Prediction
• Dynamic Branch Prediction • underlying hardware does not changes its assumption just
after one incorrect assumption,
• rather it changes its assumption after two consecutive wrong
assumption.
19
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Correlating branch prediction technique
• Static Branch Prediction
• Dynamic Branch Prediction
• If the processor guesses right most of the time, it can pre fetch the correct instructions and
buffer them so that the processor is kept busy.
• Thus, branch prediction increases the amount of work available for the processor to execute.
21
Branch Prediction Limitations
• However, in typical program traces, every 5-6th instruction is a conditional jump!
• This requires very accurate branch prediction.
• The penalty of a misprediction grows with the depth of the pipeline, since a larger number of instructions will have to be
flushed.
22
https://www.researchgate.net/publication/352189762_Advancements_in_Microprocessor_Architecture_for_Ubiquitous_AI-An_Overview_on_History_Evolution_and_Upcoming_Challenges_in_AI_Implementation
Superscalar execution
• One simple way of improving these bottlenecks is to use multiple pipelines.
• In effect, multiple parallel pipelines are used.
23
Data flow analysis
• Processor analyzes which instructions are dependent on each other’s results, or data, to
create an optimized schedule of instructions.
• In fact, instructions are scheduled to be executed when ready, independent of the original
program order.
• This prevents unnecessary delay.
24
Speculative execution
25
Performance Balance
Memory
Processor speed
capacity
increased
increased
Memory speed
lags behind
processor speed
26
Logic and Memory Performance Gap
27
Memory/Storage Architecture: Historical Trend
The gap between compute and memory/storage is increasing
The problem created by such mismatches is particularly critical at the interface between processor
and main memory.
While processor speed has grown rapidly, the speed with which data can be transferred between
main memory and the processor has lagged badly.
The result is a need to look for performance balance: an adjusting of the organization and
architecture to compensate for the mismatch among the capabilities of the various components.
29
Solutions
• The problem created by such mismatches is particularly critical at the interface between processor and main
memory.
• While processor speed has grown rapidly, the speed with which data can be transferred between main memory and
the processor has lagged badly.
• The interface between processor and main memory is the most crucial pathway in the entire computer
• It is responsible for carrying a constant flow of program instructions and data between memory chips and the
processor.
• If memory or the pathway fails to keep pace with the processor’s insistent demands, the processor stalls in a wait
state, and valuable processing time is lost.
30
Solutions
• A system architect can attack this problem in a number of ways.
• Increase the number of bits that are retrieved at one time by making DRAMs
“wider”, using wide bus data paths.
• This includes the incorporation of one or more caches on the processor chip as well
as on an off-chip cache close to the processor chip.
31
Typical I/O Device Data Rates
Ethernet modem
(max speed)
Graphics display
Wi-Fi modem
(max speed)
Hard disk
Optical disc
Laser printer
Scanner
Mouse
Keyboard
101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)
• Figure 2.1 gives some examples of typical peripheral devices in use on personal computers and
• workstations. Figure 2.1 Typical I/O Device Data Rates
• While the current generation of processors can handle the data pumped out by these devices,
32
there remains the problem of getting that data moved between processor and peripheral.
I/O Devices
Peripherals with intensive I/O demands
• This design must constantly be reconsidered to cope with two constantly evolving factors:
• The rate at which performance is changing in the various technology areas (processor, buses,
memory, peripherals) differs greatly from one type of element to another.
• New applications and new peripheral devices constantly change the nature of the demand on
the system in terms of typical instruction profile and the data access patterns.
34
Improvements in Chip Organization and Architecture
Increase hardware speed of processor
35
Problems with Clock Speed and Logic Density
Power
RC delay
• Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them
• Delay increases as RC product increases
• Wire joins thinner, increasing resistance
• Wires closer together, increasing capacitance
Memory latency
Solution:
https://www.youtube.com/watch?v=IA8au8Qr3lo 37
Increased Cache Capacity
https://www.youtube.com/watch?v=IA8au8Qr3lo 38
Increased Cache Capacity
• Typically, two or three levels of cache between processor and main memory
39
Moore’s law
• Moore observed that the number of transistors that could be put on a single chip was doubling
every year.
• Correctly predicted that this pace would continue into the near future.
40
Instruction Set Architecture
• To command a computer’s hardware, you must speak its
language.
• The words of a computer’s language are called instructions
• Instruction vocabulary is called an instruction set.
• Popular Instruction Sets are
• Intel x86
• ARM
41
x86 ISA
• Early chips were given technical part numbers, such as 8086, 80386, or
80486.
• This led to the commonly used shorthand of “x86 architecture,” in reference to the
last two digits of each chip’s part number.
• Beginning in 1993, the “x86” naming convention gave way to more memorable
(and pronounceable) product names such as:
• Intel® Pentium® processor,
• Intel® Celeron® processor
42
x86 Evolution
8080 8086 – 5MHz – 29,000 transistors 80286
• first general purpose microprocessor • much more powerful • 16 Mbyte memory addressable
• 8 bit data path • 16 bit • up from 1Mb
• Used in first personal computer – Altair • instruction cache, prefetch few instructions
• 8088 (8 bit external bus) used in first IBM
PC
80386 80486
• 32 bit • sophisticated powerful cache and
• Support for multitasking instruction pipelining
43
x86 Evolution
Pentium Pentium Pro
• Superscalar • Increased superscalar organization
• Multiple instructions executed in • Aggressive register renaming
parallel • branch prediction
• data flow analysis
• speculative execution
Instruction set
x86 architecture Organization and
architecture evolved ~1 instruction per
dominant outside technology changed
with backwards month added
embedded systems dramatically
compatibility
45
Intel Microprocessor Performance
46
Intel Microprocessor Performance
• Internal memory cache: Called cache - memory found within modern processors.
• It acts as a temporary storage for frequently accessed data and instructions, significantly improving
overall system performance.
47
Intel Microprocessor Performance
Multimedia Extensions:
• Released in 1997 for Intel x86 processors.
• Introduced 57 new instructions specifically designed for multimedia tasks like audio, video, image
processing, and 3D graphics.
• Utilized Single Instruction, Multiple Data (SIMD) architecture to process multiple data elements
simultaneously.
• Hyper Threading: Imagine a single core with processing resources like execution units and registers.
• During normal operation, these resources are used by one thread at a time.
• Hyperthreading creates virtual copies of these resources, allowing two threads to "share" the core,
essentially multitasking within the same space.
• Though the threads still need to wait for each other for certain tasks, hyperthreading enables efficient
utilization of idle resources during wait times, potentially boosting overall performance.
• The x86 architecture has a complex instruction set, which makes it difficult to optimize code for
performance.
• The concept of RISC is to reduce the complexity of instructions by increasing the number of instructions
but reducing the size of the sets. In this way, commands are simplified, and low-level operations are easier
to achieve since instructions can be executed faster.This complexity can also make it harder to debug
software and hardware issues.
51
CISC vs RISC
• RISC: ARM processors commonly used in mobile devices, MIPS processors used in
embedded systems and networking equipment.
52
ARM (Advanced RISC Machine)
• Embedded system refers to the use of electronics and software within a product
• Dedicated function
53
Embedded Systems Requirements
Different
Different sizes
requirements
Safety, reliability, real-time, flexibility, legislation
Lifespan
Environmental conditions
54
Possible Organization of an Embedded System
55
Possible Organization of an Embedded System
57
Possible Organization of an Embedded System
• An actuator is a device that receives an input signal (electrical, pneumatic, or hydraulic) and converts it into
mechanical motion or force.
• Example: Stepper motor where electrical energy drives the motor
59
ARM Evolution
Designed by ARM Inc.,
Cambridge, England
Licensed to
manufacturers
High speed, small die,
low power
consumption
PDAs, hand held
• E.g. iPod, iPhone
games, phones
Acorn produced ARM1
& ARM2 in 1985 and
ARM3 in 1989
Acorn, VLSI and Apple
Computer founded
ARM Ltd.
60
ARM Systems Categories
Application platform
• Linux, Palm OS, Symbian OS, Windows mobile
Secure applications
62
Cloud Service
63
Thank You