A Comprehensive Survey of Various Processor Types & Latest Architectures

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

A Comprehensive Survey of Various Processor types


& Latest Architectures
Harisha M.S1, Dr. D. Jayadevappa2
1
Research Scholar, Jain University, Bangalore, Karnataka, India
2
Professor, Electronics Instrumentation, Dept., JSSATE, Bangalore, Karnataka, India

Abstract- This technology survey paper covers application based


Processors- ASIP, Processor based on Flynn’s classification
which includes SISD, SIMD, MISD, MIMD, Special processors
like Graphics processing unit (GPU), Physics processing
unit (PPU), Digital signal processor (DSP), network processor,
front end processor, co-processor and processor based on
number of cores which includes- single core, multi core, multi-
processor, hyper threading and multi core with shared cache
processors.
Keywords- ASIP, SISD, SIMD, MISD, MIMD, GPU, PPU, DSP,
Network processor, coprocessor.

I. INTRODUCTION
A. Types of processors- Processor based on application
Application-specific instruction-set processor (ASIP)
Fig.2 Flynn’s classification
Application-specific instruction set processor is a component
used in system-on-a-chip design. The instruction set of an SISD & SIMD
ASIP is designed to benefit a specific application [1]. This
specialization of the core provides a tradeoff between the
flexibility of a general purpose CPU and the performance of
an ASIC.

Fig.3 SISD and SIMD

SISD
 SISD is a computer architecture in which a single
uni-core processor executes a single instruction
. Fig.1 ASIP stream, to operate on data stored in a single memory.
 A sequential computer which exploits no parallelism
B. Processor types based on Flynn’s classification in either the instruction or data streams.
In Flynn's taxonomy, the processors are classified based on  Examples of SISD architecture are the traditional
the number of concurrent instruction and data streams uniprocessor machines like PC or old mainframes
available in the architecture: Single Instruction Single Data [3].
(SISD), Single Instruction Multiple Data (SIMD), Multiple
SIMD
Instruction Multiple Data (MIMD) and multiple instruction
single data (MISD) [2].  SIMD is a computer with multiple processing
elements that perform the same operation on multiple

www.rsisinternational.org Page 71
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

data points simultaneously. It performs parallel


computations, but only on a single instruction at a
given moment.
 Example, an array processor [4].
MISD & MIMD

Fig.4 MISD and MIMD

MISD
 MISD is a parallel computing architecture where many
functional units perform different operations on the Single core processor
same data. A single-core processor is a microprocessor with a single core
 Architecture which is generally used for fault on a chip, running a single thread at any one time.
tolerance.
 Heterogeneous systems operate on the same data
stream and must agree on the result.
 Examples include the Space Shuttle flight control
computer.
MIMD
 MIMD machines include a number of processors that
function asynchronously and independently. At any
time, different processors execute different instructions
on different pieces of data.
 Multiple autonomous processors simultaneously
executing different instructions on different data. Fig.5 Single core processor

 Distributed systems are generally recognized to be Multiprocessing is the use of two or more central processing
MIMD architectures; either exploiting a single shared units (CPUs) within a single computer system.
memory space or a distributed memory space [8]. Multiprocessor is a computer system having two or
more processing units each sharing main memory and
C. Processor classification based on no of cores:
peripherals, in order to simultaneously process programs [5].
 Single core
 Multiprocessor
 Hyper-Threading
 Multi-core
 Multi-core with shared cache

Fig.6 Multiprocessor

www.rsisinternational.org Page 72
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

Hyper-Threading is a technology used by some Advantages


Intel microprocessors that allow a single microprocessor to act
like two separate processors to the operating system and the  Reduces cache underutilization since, when one core
application programs that use it. Intel Hyper-Threading is idle, the other core can have access to the whole
technology uses processor resources more efficiently, shared resource.
enabling multiple threads to run on each core. It increases  It offers faster data transfer between the cores.
processor throughput thus improving overall performance on  It simplifies the cache coherence logic and reduces
threaded software [6]. the severe penalty caused by false sharing.
 It is well suited for facilitating multi-core application
partitioning and pipelining.
 The shared cache architecture provides a better
performance/cost ratio than dedicated cache.
D. Special Processors
Graphics processing unit (GPU)
Graphics processing unit (GPU) is a specialized electronic
circuit designed to rapidly manipulate and alter memory to
accelerate the creation of images in a frame buffer intended
for output to a display device. GPUs are used in embedded
systems, mobile phones, personal computers, workstations,
and game consoles. Modern GPUs are very efficient at
manipulating computer graphics and image processing, and
Fig.7 Hyper-threading processor their highly parallel structure makes them more efficient than
Multi-core processor is an integrated circuit (IC) which general-purpose CPUs for algorithms where the processing of
includes two or more processors which results in enhanced large blocks of data is done in parallel. In a personal
performance, reduced power consumption, and more efficient computer, a GPU can be present on a video card, or it can be
simultaneous processing of multiple tasks. A dual core embedded on the motherboard [8].
processor is comparable to having multiple, separate
processors installed in the same computer, but because the
two processors are actually plugged into the same socket, the
connection between them is faster. Practically, dual core
processor is about one-and-a-half times as powerful as a
single core processor [7].

Fig.9 GPU

• A specialized circuit designed to rapidly manipulate


and alter memory
• Accelerate the building of images in a frame buffer
intended for output to a display
General Purpose Graphics Processing Unit (GPGPU)
• A general purpose graphics processing unit as a
modified form of stream processor.
• Transforms the computational power of a modern
graphics accelerator's shader pipeline into general-
Fig.8 Multi core processor
purpose computing power.
Multi core processor with shared cache
GPU architecture
A CPU cache is a hardware cache used by the central
processing unit (CPU) of a computer to reduce the average  Generic many core GPU
cost (time or energy) to access data from the main memory.  Less space devoted to control logic and caches
The L3 cache, and higher-level caches, are shared between the  Large register files to support multiple thread
cores and are not split. contexts

www.rsisinternational.org Page 73
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

 Low latency hardware managed thread switching A network processor is a special-purpose, programmable
 Large number of ALU per core with small user hardware device that combines the low cost and flexibility of
managed cache per core a RISC processor with the speed and scalability of custom
 Memory bus optimized for bandwidth silicon (i.e., ASIC chips). Network processors are building
 ~150 Gbps bandwidth allows us to service a large blocks used to construct network systems. It is specially
number of ALUs simultaneously. The GPU is designed for networking application.
specialized for high data parallel computation Network processors are typically software programmable
 More transistors can be devoted to data processing devices and have generic characteristics similar to general
rather than data caching and flow control purpose central processing units that are commonly used in
Physics processing unit (PPU) is a dedicated microprocessor many different types of equipment and products [11].
designed to handle the calculations of physics, especially in Front end processor (FEP)
the physics engine of video games. Examples of calculations
involving a PPU might include rigid body dynamics, soft
body dynamics, collision detection, fluid dynamics, hair and
clothing simulation, finite element analysis, and fracturing of
objects [9].
Digital signal processor (DSP) is a specialized
microprocessor, with its architecture optimized for the
operational needs of digital signal processing. The DSPs
measure, filter or compress continuous real-world analog
signals. Dedicated DSPs have better power efficiency and
hence they are more suitable in portable devices such
as mobile phones. DSPs often use special memory
architectures that are able to fetch multiple data or instructions
at the same time.

Fig.12 FEP

Front end processor is a small-sized computer which


connects networks such as SNA, or peripheral devices, such
Fig.10 Digital signal processor as terminals, disk units, printers and tape units to the host
Digital signal processing algorithms typically require a large computer. Data is transferred between the host computer and
number of mathematical operations to be performed quickly the front end processor through a high-speed parallel
and repeatedly on a series of data samples. Signals are interface.
constantly converted from analog to digital, manipulated The front end processor communicates with peripheral
digitally, and then converted back to analog form. A devices using serial interfaces through communication
specialized digital signal processor provides a low-cost networks. The purpose is to off-load from the host computer
solution, with better performance, lower latency, and no the work of managing the peripheral devices, transmitting and
requirements for specialized cooling or large batteries [10]. receiving messages, packet assembly and disassembly, error
Network processor detection and error correction. Examples include IBM 3705
Communications Controller and the Burroughs Data
Communications Processor. It performs tasks such
as telemetry control, data collection, reduction of
raw sensor data, analysis of keyboard input, etc [12].
Coprocessor is a computer processor that supplements the
functions of the primary processor. Operations performed by
the coprocessor includes floating point arithmetic,
graphics, signal processing, string
processing, encryption or I/O interfacing with peripheral
devices. By offloading processor-intensive tasks from
the main processor, coprocessors accelerate the system
performance [13].
Fig.11 Network Processor

www.rsisinternational.org Page 74
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

Fig.13 Coprocessor

E. Modern Processor Architecture classification


Data Parallel Architectures
Fig.14 Systolic array processor
• SIMD Processors
A systolic array is a homogeneous network of tightly
– Multiple processing elements driven by a single coupled data processing units (DPUs) called cells or nodes.
instruction stream DPU independently computes a partial result as a function of
the data received from its upstream neighbors, stores the result
• Vector Processors
within itself and passes it downstream. DPU is similar
– Uni-processors with vector instructions to central processing units(CPU)s, but do not have a program
counter, since operation is transport-triggered, i.e., by the
• Associative Processors arrival of a data object, in an array where data flows across
– SIMD like processors with associative memory the array between neighbors, usually with different data
flowing in different directions [14].
• Systolic Arrays
Superscalar processor
– Application specific VLSI structures
Superscalar processor is a CPU that implements instruction-
level parallelism within a single processor. A superscalar
processor can execute more than one instruction during a
clock cycle by simultaneously dispatching multiple
instructions and results in high throughput. Characteristics of
superscalar technique include:
 Instructions are issued from a sequential instruction
stream.
 The CPU dynamically checks for data
dependencies between instructions at run time.
 The CPU can execute multiple instructions per clock
cycle.
 Superscalar machines issue a variable number of
instructions each clock cycle, up to some maximum
 instructions must satisfy some criteria of
independence
 simple choice is maximum of one fp and one integer
Systolic array processor instruction per clock
 need separate execution paths for each possible
simultaneous instruction issue

www.rsisinternational.org Page 75
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

 compiled code from non-superscalar implementation  DM-MIMD machine represents the high-
of same architecture runs unchanged, but slower performance computers.
 Superscalar processing is the ability to initiate  Each processor has its own local memory. „
multiple instructions during the same clock cycle.  The processors are connected to each other. „
 A typical Superscalar processor fetches and decodes  The demands imposed on the communication
the incoming instruction stream several instructions network are lower than in the case of a SM-MIMD
at a time. the communication between processors may be
 Superscalar architecture exploits the potential of ILP slower than the communication between processor
(Instruction Level Parallelism) [15]. and memory. „
 Distributed memory systems can be hugely expanded
 The processors can only access their own memory. If
they require data from the memory of another
processor, then these data have to be copied [16]. ‰
Advantages
 The bandwidth problem that haunts shared-memory
systems is avoided because the bandwidth scales up
automatically with the number of processors.
 The speed of the memory is less important for the
DM-MIMD machines, because more processors can
be configured without any bandwidth problems.
IF = Instruction Fetch, ID = Instruction Decode, EX =
Execute, MEM = Memory access, WB = Register write back. Disadvantages
 The communication between processors is slower
and hence the synchronization overhead in case of
communicating tasks is higher than in shared-
memory machines.
 The accesses to data that are not in the local memory
have to be obtained from non-local memory and
hence it is slow compared to local data access.
Shared memory MIMD

Fig.15 Super scalar architecture

Distributed memory MIMD

Fig.17 Shared Memory system

In shared-memory MIMD machines, several processors


access a common memory from which they draw their
instructions and data. Such system is used to perform a
common task by executing parts of the programs in parallel
there must be some way to coordinate and synchronies
various parts of the program. This is done by a network that
Fig.16 Distributed memory MIMD connects the CPU cores to each other and to the memory [17].

www.rsisinternational.org Page 76
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue IV, April 2018 | ISSN 2321–2705

 All processors are connected to a common memory II. CONCLUSION


(RAM-Random Access Memory) „ As a part of my ongoing research work on design,
 All processors are identical and have equal memory development and implementation of multi core hybrid
access ‰This is called symmetric multiprocessing processor- it is necessary to study and understand all existing,
(SMP). „ contemporary and popular processors and their respective
 The connection between process and memory is of architectures, specifications, feature set and key performance
predominant importance. „ matrices. Many of these parameters needs to be experimented
 For example: a shared memory system with a bus implemented and tested on a common test bed. This survey
connection. ‰ has also thrown up many new varieties of application specific
custom processors which perform with super efficiencies.
Disadvantage
All processors have to share the bandwidth provided by the REFERENCES
bus. To circumvent the problem of limited memory [1]. https://en.wikipedia.org/wiki/Application-
bandwidth, direct connections from each CPU to each specific_instruction_set_processor
memory module are desired. This can be achieved by using a [2]. https://en.wikipedia.org/wiki/Flynn%27s_taxonomy
crossbar switch. The problem is their high complexity when [3]. https://en.wikipedia.org/wiki/SISD
[4]. https://en.wikipedia.org/wiki/SIMD
many connections need to be made. This problem can be [5]. https://en.wikipedia.org/wiki/Multiprocessing
weakened by using multi-stage crossbar switches, which in [6]. https://en.wikipedia.org/wiki/Hyper-threading
turn leads to longer communication times. But the number of [7]. “Multicore Applications in Real Time Systems” Vaidehi M,
CPUs & memory modules than can be connected by crossbar T.R.Gopalakrishnan Nair.
[8]. https://en.wikipedia.org/wiki/Graphics_processing_unit
switches is limited. „ [9]. https://en.wikipedia.org/wiki/Physics_processing_unit
[10]. https://en.wikipedia.org/wiki/Digital_signal_processor
Advantages [11]. “Network Processors” Douglas Comer
[12]. https://en.wikipedia.org/wiki/Front-end_processor
 All processors make use of the whole memory. This [13]. https://en.wikipedia.org/wiki/Coprocessor
makes them easy to program and efficient to use. „ [14]. https://www.revolvy.com/main/index.php?s=Systolic%20array&it
 The limiting factor to their performance is the em_type=topic
[15]. https://en.wikipedia.org/wiki/Superscalar_processor
number of processors and memory modules that can [16]. “Overview of recent supercomputers” Aad J. van der Steen
be connected to each other. [17]. https://en.wikipedia.org/wiki/MIMD
 Shared memory-systems usually consist of few
processors.
F. Processor classification tree

www.rsisinternational.org Page 77

You might also like