0% found this document useful (0 votes)
22 views64 pages

Week2 - 1

This document discusses the evolution of computer memory and processor architectures. It describes how magnetic core memory worked and was later replaced by semiconductor memory as it became smaller, faster, and cheaper. It outlines 13 generations of semiconductor memory density increases. It also discusses the evolution of Intel microprocessors from the 4004 to modern multi-core designs. Additionally, it covers techniques used to increase processor speeds like pipelining, branch prediction, and speculative execution which attempt to keep the processor busy by predicting future instructions.

Uploaded by

Abdul Moiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views64 pages

Week2 - 1

This document discusses the evolution of computer memory and processor architectures. It describes how magnetic core memory worked and was later replaced by semiconductor memory as it became smaller, faster, and cheaper. It outlines 13 generations of semiconductor memory density increases. It also discusses the evolution of Intel microprocessors from the 4004 to modern multi-core designs. Additionally, it covers techniques used to increase processor speeds like pipelining, branch prediction, and speculative execution which attempt to keep the processor busy by predicting future instructions.

Uploaded by

Abdul Moiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

COAL

Basic Concepts and Evaluation


Instruction Set Architectures X86, ARM

Dr. Zafar Iqbal


Assistant Professor
Department of Cyber Security, FCAI.
Air University, Islamabad.

2
Magnetic Memory
• In core storage, each ferrite ring can represent a 0 or 1 bit, depending
on its magnetic state.

• If magnetized in one direction, it represents a 1 bit, and

• if magnetized in the opposite direction, it represents a 0 bit.

• These cores are magnetized by sending an electric current through the


wires on which the core is laced.

• It is this direction of current that determines the state of each core.

Types of internal storage (tpub.com) 2


Semiconductor Memory
• 1970, Fairchild produced first relatively large semiconductor memory
• Size of a single core
• i.e. 1 bit of magnetic core memory storage holds 256 bits
• It took only 70 billionths of a second to read a bit.
• However, the cost per bit was higher than for that of core.
• In 1978, price per bit of semiconductor memory dropped below the price per bit of core
memory
• Following this, there has been rapid decline in memory cost accompanied by a corresponding
increase in physical memory density.
• Led the way to smaller, faster machines with memory sizes of larger and more expensive
machines from just a few years ear

3
Semiconductor Memory
• Since 1970, semiconductor memory has been through 13 generations:

• 1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of this writing, 8 Gb on a
single chip (1k = 2^10 , 1M = 2^20 , 1G = 2^30).

• Reach 16 Gb by 2018 and 32 Gb by 2023 [ITRS14].

• Just as the density of elements on memory chips has continued to rise, so has the density of
elements on processor chips.
• As time went on, more and more elements were placed on each chip,
• So that fewer and fewer chips were needed to construct a single computer processor

4
Intel
1971 – 4004
• First microprocessor
• All CPU components on a single chip
• 4 bit number addition
• Multiply number by repeated Addition

1972 - 8008
• 8 bit
• Both 4004 and 8008 designed for specific applications

1974 - 8080
• Intel’s first general purpose microprocessor
Dr. Syed Atif Ali Shah 5
Evolution of Intel Micro Processors
Evolution of Intel Micro Processors
Evolution of Intel Micro Processors
Evolution of Intel Micro Processors
Memory/Storage Architecture: Historical Trend
The gap between compute and memory/storage is increasing

https://medium.com/@abruyns/memory-holds-the-keys-to-ai-adoption-5acd5e06508b Pure Storage Inc. 10


Speeding it up

Pipelining
Branch prediction
Superscalar execution
Data flow analysis
Speculative execution
Dr. Syed Atif Ali Shah 11
Pipelining
• Pipelining enables a processor to work simultaneously on multiple instructions.
• For example, while one instruction is being executed, the computer is decoding the next
instruction.

12
https://www.researchgate.net/publication/352189762_Advancements_in_Microprocessor_Architecture_for_Ubiquitous_AI-An_Overview_on_History_Evolution_and_Upcoming_Challenges_in_AI_Implementation
Pipelining Limitations
• The speed of a pipeline is eventually limited by the slowest stage.
• For this reason, conventional processors rely on very deep pipelines (20 stage pipelines).

• However, in typical program traces, every 5-6th instruction is a conditional jump!


• This requires very accurate branch prediction.

• The penalty of a miss prediction grows with the depth of the pipeline, since a larger number of instructions will have to be
flushed.

13
https://www.researchgate.net/publication/352189762_Advancements_in_Microprocessor_Architecture_for_Ubiquitous_AI-An_Overview_on_History_Evolution_and_Upcoming_Challenges_in_AI_Implementation
Branch prediction
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• Dynamic Branch Prediction

14
Correlating Branch Prediction - GeeksforGeeks
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• Dynamic Branch Prediction Static BP: underlying hardware assumes that either the
branch is not taken always or the branch is taken always.

15
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
Dynamic BP: Prediction by underlying hardware is not fixed,
• Dynamic Branch Prediction rather it changes dynamically.

• 1-bit branch prediction technique


• 2-bit branch prediction technique
• Correlating branch prediction technique

16
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• 1-bit branch prediction technique
• Dynamic Branch Prediction

17
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Static Branch Prediction
• 1-bit branch prediction technique
• Dynamic Branch Prediction

18
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• 2-bit branch prediction technique
• Static Branch Prediction
• Dynamic Branch Prediction • underlying hardware does not changes its assumption just
after one incorrect assumption,
• rather it changes its assumption after two consecutive wrong
assumption.

19
Branch prediction (BP)
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.
• Techniques
• Correlating branch prediction technique
• Static Branch Prediction
• Dynamic Branch Prediction

Local History Table Local Prediction Table


20
https://www.geeksforgeeks.org/correlating-branch-prediction/
Branch prediction
• Processor looks ahead in the instruction code fetched from memory and predicts which
branches, or groups of instructions, are likely to be processed next.

• If the processor guesses right most of the time, it can pre fetch the correct instructions and
buffer them so that the processor is kept busy.

• Thus, branch prediction increases the amount of work available for the processor to execute.

21
Branch Prediction Limitations
• However, in typical program traces, every 5-6th instruction is a conditional jump!
• This requires very accurate branch prediction.

• The penalty of a misprediction grows with the depth of the pipeline, since a larger number of instructions will have to be
flushed.

22
https://www.researchgate.net/publication/352189762_Advancements_in_Microprocessor_Architecture_for_Ubiquitous_AI-An_Overview_on_History_Evolution_and_Upcoming_Challenges_in_AI_Implementation
Superscalar execution
• One simple way of improving these bottlenecks is to use multiple pipelines.
• In effect, multiple parallel pipelines are used.

wastage of resources due to data dependencies

23
Data flow analysis
• Processor analyzes which instructions are dependent on each other’s results, or data, to
create an optimized schedule of instructions.
• In fact, instructions are scheduled to be executed when ready, independent of the original
program order.
• This prevents unnecessary delay.

24
Speculative execution

Branch Data Flow


Prediction Analysis

25
Performance Balance

Memory
Processor speed
capacity
increased
increased

Memory speed
lags behind
processor speed
26
Logic and Memory Performance Gap

27
Memory/Storage Architecture: Historical Trend
The gap between compute and memory/storage is increasing

https://medium.com/@abruyns/memory-holds-the-keys-to-ai-adoption-5acd5e06508b Pure Storage Inc. 28


Solution
 Processor power has raced ahead at fast speed, other critical components of the computer have
not kept up.

 The problem created by such mismatches is particularly critical at the interface between processor
and main memory.

 While processor speed has grown rapidly, the speed with which data can be transferred between
main memory and the processor has lagged badly.

 The result is a need to look for performance balance: an adjusting of the organization and
architecture to compensate for the mismatch among the capabilities of the various components.

29
Solutions
• The problem created by such mismatches is particularly critical at the interface between processor and main
memory.

• While processor speed has grown rapidly, the speed with which data can be transferred between main memory and
the processor has lagged badly.

• The interface between processor and main memory is the most crucial pathway in the entire computer

• It is responsible for carrying a constant flow of program instructions and data between memory chips and the
processor.

• If memory or the pathway fails to keep pace with the processor’s insistent demands, the processor stalls in a wait
state, and valuable processing time is lost.

30
Solutions
• A system architect can attack this problem in a number of ways.

• Increase the number of bits that are retrieved at one time by making DRAMs
“wider”, using wide bus data paths.

• Change the DRAM interface to make it more efficient by including a cache.

• This includes the incorporation of one or more caches on the processor chip as well
as on an off-chip cache close to the processor chip.

31
Typical I/O Device Data Rates
Ethernet modem
(max speed)

Graphics display

Wi-Fi modem
(max speed)

Hard disk

Optical disc

Laser printer

Scanner

Mouse

Keyboard

101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)

• Figure 2.1 gives some examples of typical peripheral devices in use on personal computers and
• workstations. Figure 2.1 Typical I/O Device Data Rates

• These devices create tremendous data throughput demands.

• While the current generation of processors can handle the data pumped out by these devices,
32
there remains the problem of getting that data moved between processor and peripheral.
I/O Devices
Peripherals with intensive I/O demands

Processors can handle this Problem by:


• Caching
• Buffering
• Higher-speed interconnection buses
• More elaborate bus structures
• Multiple-processor configurations
33
Key is Balance
• Designers constantly struggle to balance the throughput and processing demands of the processor, main
memory, I/O devices, and the interconnection structures.

• This design must constantly be reconsidered to cope with two constantly evolving factors:

• The rate at which performance is changing in the various technology areas (processor, buses,
memory, peripherals) differs greatly from one type of element to another.

• New applications and new peripheral devices constantly change the nature of the demand on
the system in terms of typical instruction profile and the data access patterns.

34
Improvements in Chip Organization and Architecture
Increase hardware speed of processor

• Fundamentally due to shrinking logic gate size


• More gates, packed more tightly, increasing clock rate
• Propagation time for signals reduced

Increase size and speed of caches

• Dedicating part of processor chip


• Cache access times drop significantly

Change processor organization and architecture

• Increase effective speed of execution


• Parallelism

35
Problems with Clock Speed and Logic Density
Power

• Power density increases with density of logic and clock speed


• Dissipating heat

RC delay

• Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them
• Delay increases as RC product increases
• Wire joins thinner, increasing resistance
• Wires closer together, increasing capacitance

Memory latency

• Memory speeds lag processor speeds

Solution:

• More emphasis on organizational and architectural approaches


36
Increased Cache Capacity

https://www.youtube.com/watch?v=IA8au8Qr3lo 37
Increased Cache Capacity

https://www.youtube.com/watch?v=IA8au8Qr3lo 38
Increased Cache Capacity

• Typically, two or three levels of cache between processor and main memory

• Chip density increased

• More cache memory on chip

• Faster cache access

• Pentium chip devoted about 10% of chip area to cache

• Pentium 4 devotes about 50%

39
Moore’s law
• Moore observed that the number of transistors that could be put on a single chip was doubling
every year.

• Correctly predicted that this pace would continue into the near future.

40
Instruction Set Architecture
• To command a computer’s hardware, you must speak its
language.
• The words of a computer’s language are called instructions
• Instruction vocabulary is called an instruction set.
• Popular Instruction Sets are
• Intel x86
• ARM

41
x86 ISA
• Early chips were given technical part numbers, such as 8086, 80386, or

80486.

• This led to the commonly used shorthand of “x86 architecture,” in reference to the
last two digits of each chip’s part number.

• Beginning in 1993, the “x86” naming convention gave way to more memorable
(and pronounceable) product names such as:
• Intel® Pentium® processor,
• Intel® Celeron® processor

42
x86 Evolution
8080 8086 – 5MHz – 29,000 transistors 80286
• first general purpose microprocessor • much more powerful • 16 Mbyte memory addressable
• 8 bit data path • 16 bit • up from 1Mb
• Used in first personal computer – Altair • instruction cache, prefetch few instructions
• 8088 (8 bit external bus) used in first IBM
PC

80386 80486
• 32 bit • sophisticated powerful cache and
• Support for multitasking instruction pipelining

• built in maths co-processor

43
x86 Evolution
Pentium Pentium Pro
• Superscalar • Increased superscalar organization
• Multiple instructions executed in • Aggressive register renaming
parallel • branch prediction
• data flow analysis
• speculative execution

Pentium II Pentium III


• MMX technology • Additional floating point
• graphics, video & audio processing instructions for 3D graphics

Dr. Syed Atif Ali Shah 44


x86 Evolution
Pentium 4 Core Core 2 Core 2 Quad – 3GHz
• Note Arabic rather than • First x86 with dual core • 64 bit architecture – 820 million
Roman numerals transistors
• Further floating point and
• Four processors on chip
multimedia enhancements

Instruction set
x86 architecture Organization and
architecture evolved ~1 instruction per
dominant outside technology changed
with backwards month added
embedded systems dramatically
compatibility

See Intel web pages


500 instructions for detailed
available information on
processors

45
Intel Microprocessor Performance

46
Intel Microprocessor Performance

• Internal memory cache: Called cache - memory found within modern processors.

• It acts as a temporary storage for frequently accessed data and instructions, significantly improving
overall system performance.

47
Intel Microprocessor Performance

• Speculative out-of-order execution: Branch prediction + Data Analysis

Dr. Syed Atif Ali Shah 48


Intel Microprocessor Performance

Multimedia Extensions:
• Released in 1997 for Intel x86 processors.
• Introduced 57 new instructions specifically designed for multimedia tasks like audio, video, image
processing, and 3D graphics.
• Utilized Single Instruction, Multiple Data (SIMD) architecture to process multiple data elements
simultaneously.

Dr. Syed Atif Ali Shah 49


Intel Microprocessor Performance

• Hyper Threading: Imagine a single core with processing resources like execution units and registers.

• During normal operation, these resources are used by one thread at a time.

• Hyperthreading creates virtual copies of these resources, allowing two threads to "share" the core,
essentially multitasking within the same space.

• Though the threads still need to wait for each other for certain tasks, hyperthreading enables efficient
utilization of idle resources during wait times, potentially boosting overall performance.

Dr. Syed Atif Ali Shah 50


CISC vs RISC

• Complex instruction set computer” (CISC—pronounced “sisk”)

• The x86 architecture has a complex instruction set, which makes it difficult to optimize code for
performance.

• The main idea behind CISC processors is that:


• a single instruction can be used to do all of the loading, evaluating, and storing operations. Because
of this, instructions are relatively more complicated compared to RISC, hence the name Complex
Instruction.

• The concept of RISC is to reduce the complexity of instructions by increasing the number of instructions
but reducing the size of the sets. In this way, commands are simplified, and low-level operations are easier
to achieve since instructions can be executed faster.This complexity can also make it harder to debug
software and hardware issues.

51
CISC vs RISC

• CISC: x86 processors from Intel and AMD.

• RISC: ARM processors commonly used in mobile devices, MIPS processors used in
embedded systems and networking equipment.

52
ARM (Advanced RISC Machine)

• ARM is a family of RISC- based microprocessors and microcontrollers designed by ARM


Holdings, Cambridge, EnglandARM architecture is the most commonly implemented 32-
bit instruction set architecture

• Used mainly in embedded systems

• Embedded system refers to the use of electronics and software within a product

• ARM is used within product, Not general purpose computer

• Dedicated function

• E.g. Anti-lock brakes in car, Digital Cameras, Cell phones.

53
Embedded Systems Requirements
Different
Different sizes
requirements
Safety, reliability, real-time, flexibility, legislation

Lifespan

Environmental conditions

Different constraints, optimization, reuse Static v dynamic loads

Slow to fast speeds

Computation v I/O intensive

Descrete event v continuous dynamics

54
Possible Organization of an Embedded System

55
Possible Organization of an Embedded System

• With human interface, we interact with the Embedded System.


• Communication channels through which users interact with and control the system, providing access to data,
functionalities, and settings 56
Possible Organization of an Embedded System

• The diagnostic port may be used for diagnosing the system

57
Possible Organization of an Embedded System

• Special- purpose field programmable (FPGA), application- specific (ASIC), or even


nondigital hardware may be used to increase performance or reliability.
58
Possible Organization of an Embedded System

• An actuator is a device that receives an input signal (electrical, pneumatic, or hydraulic) and converts it into
mechanical motion or force.
• Example: Stepper motor where electrical energy drives the motor
59
ARM Evolution
Designed by ARM Inc.,
Cambridge, England
Licensed to
manufacturers
High speed, small die,
low power
consumption
PDAs, hand held
• E.g. iPod, iPhone
games, phones
Acorn produced ARM1
& ARM2 in 1985 and
ARM3 in 1989
Acorn, VLSI and Apple
Computer founded
ARM Ltd.
60
ARM Systems Categories

Embedded real time

Application platform
• Linux, Palm OS, Symbian OS, Windows mobile

Secure applications

Dr. Syed Atif Ali Shah 61


Cloud Computing

• A model for enabling universal, convenient, on- demand


network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and
services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.

62
Cloud Service

63
Thank You

You might also like