0% found this document useful (0 votes)
66 views36 pages

1.1history of Computer Processor

The document discusses the history and evolution of computer processors from the first generation to the Intel 4004, the first commercially produced microprocessor. It describes how early processors were large machines using vacuum tubes and magnetic drums. The Intel 4004 was developed in 1971 and marked a significant advancement by integrating the processor onto a single chip, driving the transition to more compact computing devices.

Uploaded by

Nilesh Shirgire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views36 pages

1.1history of Computer Processor

The document discusses the history and evolution of computer processors from the first generation to the Intel 4004, the first commercially produced microprocessor. It describes how early processors were large machines using vacuum tubes and magnetic drums. The Intel 4004 was developed in 1971 and marked a significant advancement by integrating the processor onto a single chip, driving the transition to more compact computing devices.

Uploaded by

Nilesh Shirgire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 1

INTRODUCTION
The computer processor, often referred to as the central processing unit (CPU), serves as
the brain of modern computing devices, from the smallest smartphones to the largest
supercomputers. This report examines the evolution, types, and operational mechanisms
of computer processors, as well as their critical role in advancing technology. It provides
an overview of how processors have transformed from simple machines capable of
executing basic commands to complex systems that drive the forefront of artificial
intelligence and quantum computing. Furthermore, the report explores various facets of
processor performance metrics such as clock speed, core count, and cache memory,
which significantly influence computing efficiency and capabilities. Additionally, it
delves into the impact of processors on modern technology and forecasts future trends in
processor development. Understanding the intricacies of computer processors is essential
for anyone engaged in the fields of computing, electronics, or information technology, as
these components are integral to both hardware functionality and software performance.

1.1History of computer processor


The history of computer processors begins in the early 20th century, but the significant
development occurred in 1971 with the invention of the first microprocessor, the Intel
4004. This was a revolutionary step that shifted computing from large, room-sized
machines to more compact and accessible devices. Over the decades, advances in
microarchitecture, manufacturing processes, and semiconductor technology have led to
exponential increases in processing power and efficiency.
Initially, processors relied on relatively simple architectures that executed instructions
serially. The pursuit of enhanced performance facilitated the evolution from 8-bit, to 16-
bit, and subsequently to 32-bit and 64-bit processors. This expansion in bit width enabled
processors to handle more data per clock cycle, improving overall efficiency and
capability.
Parallel processing and the introduction of multiple cores in a single processor during the
late 20th and early 21st centuries marked another significant advancement. This

1
development allowed for simultaneous processing of tasks, thereby increasing
performance and making multitasking more seamless.
As the processor technology evolved, it catalyzed advancements in other technology
areas, including personal computing, mobile communications, and large-scale data
processing. Each generation of processors brought with it improvements in speed,
efficiency, and the complexity of tasks that could be undertaken, setting the stage for
modern computing landscapes that continue to evolve today through ongoing
innovations.
1.2 First generation processors
First generation processors emerged in the early days of computing, characterized by
their construction using vacuum tubes and magnetic drums. This period, spanning the late
1940s to the late 1950s, included machines such as the ENIAC, UNIVAC, and IBM 701.
These processors were large, consumed excessive amounts of power, and required
significant maintenance, making them impractical for widespread use in consumer
products. Despite these limitations, they were capable of performing calculations much
faster than could be done manually, solving complex mathematical and logistical
problems that were previously unmanageable. These first-generation processors laid
down the foundational principles of computing and paved the way for subsequent
developments in processor technology.
In April 1969, Busicom approached Intel to produce a new design for an electronic
calculator. They based their design on the architecture of the 1965 Olivetti Programma
101, one of the world's first tabletop programmable calculators. The key difference was
that the Busicom design would use integrated circuits to replace the printed circuit boards
filled with individual components, and solid-state shift registers for memory instead of
the costly magnetostriction wire in the 101.
In contrast to earlier calculator designs, Busicom had developed a general-purpose
processor concept with the goal of introducing it in a low-end desktop printing calculator,
and then using the same design for other roles like cash registers and automatic teller
machines. The company had already produced a calculator using TTL small-scale
integration logic ICs and were interested in having Intel reduce the chip count using
Intel's medium-scale integration (MSI) techniques.

2
Intel assigned the recently hired Marcian Hoff, employee number 12, to act as the liaison
between the two companies. In late June, three engineers from Busicom, Masatoshi
Shima and his colleagues Masuda and Takayama, traveled to Intel to introduce the
design. Although he had only been assigned to liaise with the engineers, Hoff began
studying the concept. Their initial proposal had seven ICs: program control, arithmetic
unit (ALU), timing, program ROM, shift registers for temporary memory, printer
controller and input/output control.
Hoff became concerned that the number of chips and the required interconnections
between them would make Busicom's price goals impossible to meet. Combining the
chips would reduce the complexity and cost. He was also concerned that the still-small
Intel would not have enough design staff to make seven separate chips at the same time.
He raised these concerns with upper management, and Bob Noyce, the CEO, told Hoff he
would support a different approach if it seemed feasible.
The Intel 4004 is a 4-bit central processing unit (CPU) released by Intel Corporation in
1971. Sold for US$60 (equivalent to $450 in 2023), it was the first commercially
produced microprocessor, and the first in a long line of Intel CPUs.

3
Fig 1.1 intel 4004
General information
Launched November 15, 1971; 52 years ago
Discontinued 1981

Marketed by intel
Designed by intel
Common manufacturer intel

Performance
Max. CPU clock rate 740 KHz to 750 KHz
Data width 4bits
Address width 4bits
Address width 12 bits (multiplexed)

4
Architecture and classification
Application Busicom calculator, arithmetic manipulation
Technology node 10 μm
Instruction set 4-bit BCD-oriented
Physical specifications
Transistors 2300
Package(s) 16-pin dual in-line package
Socket(s) DIP16
History
Successor(s) Intel 8008 (8-bit)
Intel 4040 (4-bit)
The 4004 was the first significant example of large-scale integration, showcasing the
superiority of the MOS silicon gate technology (SGT). Compared to the incumbent
technology, the SGT integrated on the same chip area twice the number of transistors
with five times the operating speed. This step-function increase in performance made
possible a single-chip CPU, replacing the existing multi-chip CPUs. The innovative 4004
chip design served as a model on how to use the SGT for complex logic and memory
circuits, thus accelerating the adoption of the SGT by the world's semiconductor industry.
The developer of the original SGT at Fairchild was Federico Faggin, who designed the
first commercial integrated circuit (IC) that used the new technology, proving its
superiority for analog/digital applications (Fairchild 3708 in 1968). He later used the
SGT at Intel to obtain the unprecedented integration necessary to make the 4004.
The project traces its history to 1969, when Busicom Corp. approached Intel to design a
family of seven chips for an electronic calculator, three of which constituted a CPU
specialized for making different calculating machines. The CPU was based on data stored
on shift-registers and instructions stored on ROM (read only memory). The complexity of
the three-chip CPU logic design led Marcian Hoff to propose a more conventional CPU
architecture based on data stored on RAM (random-access memory). This architecture
was much simpler and more general-purpose and could potentially be integrated into a
single chip, thus reducing the cost and improving the speed. Design began in April 1970
under the direction of Faggin, aided by Masatoshi Shima, who contributed to the

5
architecture and later to the logic design. The first delivery of a fully operational 4004
was in March 1971 to Busicom for its 141-PF printing calculator engineering prototype
(now displayed in the Computer History Museum in Mountain View, California).[4]
General sales began July 1971.
A number of innovations developed by Faggin while working at Fairchild Semiconductor
allowed the 4004 to be produced on a single chip. The main concept was the use of the
self-aligned gate, made of polysilicon rather than metal, which allowed the components
to be much closer together and work at higher speed. To make the 4004 possible, Faggin
also developed the "bootstrap load", considered unfeasible with silicon gate, and the
"buried contact" that allowed the silicon gates to be connected directly to the source and
drain of the transistors without the use of metal. Together, these innovations doubled the
circuit density, and thus halved cost, allowing a single chip to contain 2,300 transistors
and run five times faster than designs using the previous MOS technology with aluminum
gates.
The 4004 design was later improved by Faggin as the Intel 4040 in 1974. The Intel 8008
and 8080 were unrelated designs in spite of the similar naming.

1.3 Modern processors


Modern processors are leaps and bounds ahead of their first-generation counterparts,
incorporating billions of transistors, sophisticated power management systems, and
capabilities for multitasking and multimedia processing. These current processors are
built using advanced semiconductor fabrication techniques, notably photolithography,
which allows for an ultra-small feature size on the order of nanometers. This
enhancement in technology not only increases the operational speed and efficiency of
processors but also reduces their power consumption and heat generation.
In contemporary computing, modern processors support a wide range of optimizations
and features such as out-of-order execution, branch prediction, and simultaneous
multithreading. These developments afford vast improvements in task handling and
processing speed, facilitating the running of complex software applications and
supporting the proliferation of artificial intelligence and machine learning technologies in

6
everyday devices. As a result, they continue to have a transformative impact on personal
and professional computing environments.
1.4 Types of processors
Processors are categorized into various types to cater to specific requirements in the
computing spectrum. The main types include Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Accelerated Processing Units (APUs).
CPUs, often referred to as the brain of the computer, are general-purpose processors
responsible for executing the majority of commands from the computer's operating
systems and applications. They handle basic instructions such as arithmetic, logic,
controlling, and input/output operations, making them essential for the functionality of
any computing device.
GPUs are dedicated processing units designed to handle complex mathematical and
geometric calculations efficiently, primarily for tasks involving graphics and visual
renderings. This specialization enables GPUs to perform fast image processing which is
particularly beneficial in fields such as gaming, graphic design, and video rendering.
APUs are a hybrid type of processor that combines the functionality of both CPUs and
GPUs. These processors are designed to provide more efficient processing power and
enhanced graphical performance by integrating both central and graphics processing
capabilities onto a single chip. This integration helps in boosting the performance of both
graphics and computing tasks, making APUs ideal for compact devices that require good
performance without the physical bulk of multiple chips.
Each type of processor plays a critical role in computing, with distinctions catering to
specific performance needs and applications, structuring a versatile and functional
computing environment.

7
Chapter 2

COMPONENTS OF CPU
The brain of any computer system is the CPU. It controls the functioning of the other
units and process the data. The CPU is sometimes called the processor, or in the personal
computer field called “microprocessor”. It is a single integrated circuit that contains all
the electronics needed to execute a program. The processor calculates (add, multiplies
and so on), performs logical operations (compares numbers and make decisions), and
controls the transfer of data among devices. The processor acts as the controller of all
actions or services provided by the system. Processor actions are synchronized to its
clock input. A clock signal consists of clock cycles. The time to complete a clock cycle is
called the clock period. Normally, we use the clock frequency, which is the inverse of the
clock period, to specify the clock. The clock frequency is measured in Hertz, which
represents one cycle/second. Hertz is abbreviated as Hz. Usually, we use mega Hertz
(MHz) and giga Hertz (GHz) as in 1.8 GHz Pentium. The processor can be thought of as
executing the following cycle forever:
1. Fetch an instruction from the memory,
2. Decode the instruction (i.e., determine the instruction type)
3. Execute the instruction (i.e., perform the action specified by the instruction).
Execution of an instruction involves fetching any required operands, performing the
specified operation, and writing the results back. This process is often referred to as the
fetch execute cycle, or simply the execution cycle. The execution cycle is repeated as
long as there are more instructions to execute. This raises several questions. Who
provides the instructions to the processor? Who places these instructions in the main
memory? How does the processor know where in memory these instructions are located?
38 When we write programs—whether in a high-level language or in an assembly
language— we provide a sequence of instructions to perform a particular task (i.e., solve
a problem). A compiler or assembler will eventually translate these instructions to an
equivalent sequence of machine language instructions that the processor understands. The
operating system, which provides instructions to the processor whenever a user program
is not executing, loads the user program into the main memory. The operating system
then indicates the location of the user program to the processor and instructs it to execute

8
the program. The actions of the CPU during an execution cycle are defined by micro-
orders issued by the control unit. These micro-orders are individual control signals sent
over dedicated control lines. For example, let us assume that we want to execute an
instruction that moves the contents of register X to register Y. Let us also assume that
both registers are connected to the data bus, D. The control unit will issue a control signal
to tell register X to place its contents on the data bus D. After some delay, another control
signal will be sent to tell register Y to read from data bus D
. The components of CPU.

Fig 2.1 components of processor

9
typical CPU has three major components: (1) register set, (2) arithmetic logic unit
(ALU), and (3) control unit (CU). The register set differs from one computer architecture
to another. It is usually a combination of general-purpose and special purpose registers.
General-purpose registers are used for any purpose, hence the name general purpose.
Special purpose registers have specific functions within the CPU. For example, the
program counter (PC) is a special-purpose register that is used to hold the address of the
instruction to be executed next. Another example of special-purpose registers is the
instruction register (IR), which is used to hold the instruction that is currently executed.
the main components of the CPU and its interactions with the memory system and the
input/output devices.
The ALU provides the circuitry needed to perform the arithmetic, logical and shift
operations demanded of the instruction set. The control unit is the entity responsible for
fetching the instruction to be executed from the main memory and decoding and then
executing it.
The CPU can be divided into a data section and a control section. The data section,
which is also called the datapath, contains the registers (known as the register file) and
the ALU. The datapath is capable of performing certain operations on data items. The
register file can be thought of as a small, fast memory, separate from the system memory,
which is used for temporary storage during computation.
The control section is basically the control unit, which issues control signals to the
datapath. The control unit of a computer is responsible for executing the program
instructions, which are stored in the main memory. It can be thought of as a form of a
“computer within a computer” in the sense that it makes decisions as to how the rest of
the machine behaves.
Like the system memory, each register in the register file is assigned an address in
sequence starting from zero. These register “addresses” are much smaller than main
memory addresses: a register file containing 32 registers would have only a 5-bit address,
for example. The major differences between the register file and the system memory is
that the register file is contained within the CPU, and is therefore much faster. An
instruction that operates on data from the register file can often run ten times faster than
the same instruction that operates on data in memory. For this reason, register-intensive

10
programs are faster than the equivalent memory intensive programs, even if it takes more
register operations to do the same tasks that would require fewer operations with the
operands located in memory.
The Central Processing Unit (CPU) is the primary component of a computer that handles
most of the operations and instructions from any program. CPUs perform fundamental
duties to execute the majority of digital tasks by processing data and controlling the flow
of data in the computer system. This essential processor reads instructions from a
program through the fetch-decode-execute cycle and carries out computational tasks.
Modern CPUs are equipped with multiple cores, enabling them to handle numerous tasks
simultaneously, a capability known as multitasking. Each core in a multi-core processor
can work on a different task, dramatically improving performance and efficiency when
running complex software applications. Additionally, advancements in CPU technology
have led to enhancements in cache memory and clock speeds, further boosting the
processor's capability to manage higher loads effectively.
CPUs are also crucial in determining the overall speed and power efficiency of a
computer system. They are designed to be compatible with various motherboards and to
support a wide range of peripherals, thus serving as the cornerstone for building
customizable and upgradable computer systems. These processors continue to evolve,
incorporating new technologies to meet the increasing demands for faster processing and
support for
2.1 GPU
Graphics Processing Units (GPUs) are specialized hardware designed primarily for
rendering graphics and image processing. Originally engineered to accelerate the creation
and rendering of images and videos for computer screens, GPUs now play a pivotal role
in a variety of demanding computational tasks.
The architecture of a GPU is optimized for parallel processing, allowing it to perform
thousands of simple calculations simultaneously, which is ideal for graphics rendering
and data-heavy scientific calculations. This capability makes GPUs particularly effective
not only in the realm of gaming and visual content creation but also in areas like machine
learning, video editing, and complex simulations where handling large blocks of data
concurrently is beneficial.

11
GPUs differ from CPUs in their core architecture. While a CPU has fewer cores with lots
of cache memory capable of handling a few software threads at a time, a GPU has
thousands of smaller cores designed for multi-thread processing with less cache memory
per core. This design enables efficient handling of multiple tasks, making GPUs
extremely efficient for algorithms that process large blocks of data in parallel.
Alongside traditional use in personal computing and gaming consoles, GPUs are
increasingly leveraged in server and supercomputing environments, highlighting their
growing significance in both graphical and non-graphical computing applications.
2.2 component
 control unit
The control unit (CU) is a component of the CPU that directs the operation of the
processor. It tells the computer's memory, arithmetic and logic unit and input and output
devices how to respond to the instructions that have been sent to the processor.
It directs the operation of the other units by providing timing and control signals. Most
computer resources are managed by the CU. It directs the flow of data between the CPU
and the other devices. John von Neumann included the control unit as part of the von
Neumann architecture. In modern computer designs, the control unit is typically an
internal part of the CPU with its overall role and operation unchanged since
its introduction
 Arithmetic and logical unit
The arithmetic logic unit (ALU) is a digital circuit within the processor that performs
integer arithmetic and
bitwise logic operations. The inputs to the ALU are the data words to be operated on
(called operands), status information from previous operations, and a code from the
control unit indicating which operation to perform. Depending on the instruction being
executed, the operands may come from internal CPU registers, external memory, or
constants generated by the ALU itself.
When all input signals have settled and propagated through the ALU circuitry, the result
of the performed operation appears at the ALU's outputs. The result consists of both a
data word, which may be stored in a register or memory, and status information that is

12
typically stored in a special, internal CPU register reserved for this purpose. Modern
CPUs typically contain more than one ALU to improve performance.
Address generation unit
The address generation unit (AGU), sometimes also called the address computation unit
(ACU),[70] is an execution unit inside the CPU that calculates addresses used by the
CPU to access main memory. By having address calculations handled by separate
circuitry that operates in parallel with the rest of the CPU, the number of CPU cycles
required for executing various machine instructions can be reduced, bringing
performance improvements.
While performing various operations, CPUs need to calculate memory addresses required
for fetching data from the memory; for example, in-memory positions of array elements
must be calculated before the CPU can fetch the data from actual memory locations.
Those address-generation calculations involve different integer arithmetic operations,
such as addition, subtraction, modulo operations, or bit shifts. Often, calculating a
memory address involves more than one general-purpose machine instruction, which do
not necessarily decode and execute quickly. By incorporating an AGU into a CPU design,
together with introducing specialized instructions that use the AGU, various address-
generation calculations can be offloaded from the rest of the CPU, and can often be
executed quickly in a single CPU cycle.
13
Capabilities of an AGU depend on a particular CPU and its architecture. Thus, some
AGUs implement and expose more address-calculation operations, while some also
include more advanced specialized instructions that can operate on multiple operands at a
time. Some CPU architectures include multiple AGUs so more than one address-
calculation operation can be executed simultaneously, which brings further performance
improvements due to the superscalar nature of advanced CPU designs. For example, Intel
incorporates multiple AGUs into its Sandy Bridge and Haswell microarchitectures, which
increase bandwidth of the CPU memory subsystem by allowing multiple memory-access
instructions to be executed in parallel.
 Cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a
computer to reduce the average cost (time or energy) to access data from the main
memory. A cache is a smaller, faster memory, closer to a processor core, which stores
copies of the data from frequently used main memory locations. Most CPUs have
different independent caches, including instruction and data caches, where the data cache
is usually organized as a hierarchy of more cache levels (L1, L2, L3, L4, etc.).
All modern (fast) CPUs (with few specialized exceptions) have multiple levels of CPU
caches. The first CPUs that used a cache had only one level of cache; unlike later level 1
caches, it was not split into L1d (for data) and L1i (for instructions). Almost all current
CPUs with caches have a split L1 cache. They also have L2 caches and, for larger
processors, L3 caches as well. The L2 cache is usually not split and acts as a common
repository for the already split L1 cache. Every core of a multi-core processor has a
dedicated L2 cache and is usually not shared between the cores. The L3 cache, and
higher-level caches, are shared between the cores and are not split. An L4 cache is
currently uncommon, and is generally on dynamic random-access memory (DRAM),
rather than on static random-access memory (SRAM), on a separate die or chip. That was
also the case historically with L1, while bigger chips have allowed integration of it and
generally all cache levels, with the possible exception of the last level. Each extra level of
cache tends to be bigger and is optimized differently.

14
Other types of caches exist (that are not counted towards the "cache size" of the most
important caches mentioned above), such as the translation lookaside buffer (TLB) that is
part of the memory management unit (MMU) that most CPUs have.
Caches are generally sized in powers of two: 2, 8, 16 etc. KiB or MiB (for larger non-L1)
sizes, although the IBM z13 has a 96 KiB L1 instruction cache.
 Clock rate
Most CPUs are synchronous circuits, which means they employ a clock signal to pace
their sequential operations. The clock signal is produced by an external oscillator circuit
that generates a consistent number of pulses each second in the form of a periodic square
wave. The frequency of the clock pulses determines the rate at which a CPU executes
instructions and, consequently, the faster the clock, the more instructions the CPU will
execute each second.
To ensure proper operation of the CPU, the clock period is longer than the maximum time
needed for all signals to propagate (move) through the CPU. In setting the clock period to
a value well above the worst-case propagation delay, it is possible to design the entire
CPU and the way it moves data around the "edges" of the rising and falling clock signal.
This has the advantage of simplifying the CPU significantly, both from a design
perspective and a component-count perspective. However, it also carries the disadvantage
that the entire CPU must wait on its slowest elements, even though some portions of it
are much faster. This limitation has largely been compensated for by various methods of
increasing CPU parallelism (see below).
However, architectural improvements alone do not solve all of the drawbacks of globally
synchronous CPUs. For example, a clock signal is subject to the delays of any other
electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult
to keep the clock signal in phase (synchronized) throughout the entire unit. This has led
many modern CPUs to require multiple identical clock signals to be provided to avoid
delaying a single signal significantly enough to cause the CPU to malfunction. Another
major issue, as clock rates increase dramatically, is the amount of heat that is dissipated
by the CPU. The constantly changing clock causes many components to switch regardless
of whether they are being used at that time. In general, a component that is switching uses
more energy than an element in a static state. Therefore, as clock rate increases, so does

15
energy consumption, causing the CPU to require more heat dissipation in the form of
CPU cooling solutions.
One method of dealing with the switching of unneeded components is called clock
gating, which involves turning off the clock signal to unneeded components (effectively
disabling them). However, this is often regarded as difficult to implement and therefore
does not see common usage outside of very low-power designs. One notable recent CPU
design that uses extensive clock gating is the IBM PowerPC-based Xenon used in the
Xbox 360; this reduces the power requirements of the Xbox 360.
2.3 Operation performs
The fundamental operation of most CPUs, regardless of the physical form they take, is to
execute a sequence of stored instructions that is called a program. The instructions to be
executed are kept in some kind of computer memory. Nearly all CPUs follow the fetch,
decode and execute steps in their operation, which are collectively known as the
instruction cycle.
After the execution of an instruction, the entire process repeats, with the next instruction
cycle normally fetching the next-in-sequence instruction because of the incremented
value in the program counter. If a jump instruction was executed, the program counter
will be modified to contain the address of the instruction that was jumped to and program
execution continues normally. In more complex CPUs, multiple instructions can be
fetched, decoded and executed simultaneously. This section describes what is generally
referred to as the "classic RISC pipeline", which is quite common among the simple
CPUs used in many electronic devices (often called microcontrollers). It largely ignores
the important role of CPU cache, and therefore the access stage of the pipeline.
Some instructions manipulate the program counter rather than producing result data
directly; such instructions are generally called "jumps" and facilitate program behavior
like loops, conditional program execution (through the use of a conditional jump), and
existence of functions.[c] In some processors, some other instructions change the state of
bits in a "flags" register. These flags can be used to influence how a program behaves,
since they often indicate the outcome of various operations. For example, in such
processors a "compare" instruction evaluates two values and sets or clears bits in the flags

16
register to indicate which one is greater or whether they are equal; one of these flags
could then be used by a later jump instruction to determine program flow.
 Fetch
Fetch involves retrieving an instruction (which is represented by a number or sequence of
numbers) from program memory. The instruction's location (address) in program memory
is determined by the program counter (PC; called the "instruction pointer" in Intel x86
microprocessors), which stores a number that identifies the address of the next instruction
to be fetched. After an instruction is fetched, the PC is incremented by the length of the
instruction so that it will contain the address of the next instruction in the sequence.[d]
Often, the instruction to be fetched must be retrieved from relatively slow memory,
causing the CPU to stall while waiting for the instruction to be returned. This issue is
largely addressed in modern processors by caches and pipeline architectures (see below).
 Decode
The instruction that the CPU fetches from memory determines what the CPU will do. In
the decode step, performed by binary decoder circuitry known as the instruction decoder,
the instruction is converted into signals that control other parts of the CPU.
The way in which the instruction is interpreted is defined by the CPU's instruction set
architecture (ISA).[e] Often, one group of bits (that is, a "field") within the instruction,
called the opcode, indicates which operation is to be performed, while the remaining
fields usually provide supplemental information required for the operation, such as the
operands. Those operands may be specified as a constant value (called an immediate
value), or as the location of a value that may be a processor register or a memory address,
as determined by some addressing mode.
In some CPU designs the instruction decoder is implemented as a hardwired,
unchangeable binary decoder circuit. In others, a microprogram is used to translate
instructions into sets of CPU configuration signals that are applied sequentially over
multiple clock pulses. In some cases the memory that stores the microprogram is
rewritable, making it possible to change the way in which the CPU decodes instructions.
 Execute
After the fetch and decode steps, the execute step is performed. Depending on the CPU
architecture, this may consist of a single action or a sequence of actions. During each

17
action, control signals electrically enable or disable various parts of the CPU so they can
perform all or part of the desired operation. The action is then completed, typically in
response to a clock pulse. Very often the results are written to an internal CPU register for
quick access by subsequent instructions. In other cases results may be written to slower,
but less expensive and higher capacity main memory.
For example, if an instruction that performs addition is to be executed, registers
containing operands (numbers to be summed) are activated, as are the parts of the
arithmetic logic unit (ALU) that perform addition. When the clock pulse occurs, the
operands flow from the source registers into the ALU, and the sum appears at its output.
On subsequent clock pulses, other components are enabled (and disabled) to move the
output (the sum of the operation) to storage (e.g., a register or memory). If the resulting
sum is too large (i.e., it is larger than the ALU's output word size), an arithmetic overflow
flag will be set, influencing the next operation.

18
Chapter 3

HOW PROCESSOR ARE MADE


The CPU, sometimes known as a "microprocessor," is the heart of modern computers.
The specs and frequencies of the CPU are frequently regarded as crucial indicators of a
computer's performance in PCs. The Intel x86 architecture has been around for more

than two decades, and the x86 architecture's CPU has had a significant impact on most of
our work and lives
Fig 3.1 Transistor
The transistor is the most critical component in the CPU, according to many
acquaintances who know a bit about computers, The most important point to improve the
speed of the CPU is to figure out how to fit more transistors into the same CPU area,
because the CPU is too small and precise, and there are so many transistors in it, that
humans will never be able to finish it, and it can only be processed by photolithography.
This is why a CPU can have so many transistors. A transistor is a switch with two
positions: on and off. If you go back to the early days of computing, that was all a
computer needed to get the job done. The machine has two options: on and off, which are
0 and 1. So, how would you go about making a CPU? In today's essay, we'll take you

19
through the entire process of creating a central processing unit, from a pile of sand to a
powerful integrated circuit chip, step by step.

3.1 Basic raw materials for making CPUs


If you ask what the raw material of the CPU is, almost everyone will say silicon. True,
but how did the silicon get there? It is, in reality, the least noticeable sand. It's difficult to
believe that the CPU's pricey, complicated, powerful, and enigmatic nature emerged from
such worthless sand. Of course, somewhere in the midst, there must be a difficult
production process. To manufacture raw materials, though, you can't just grab a handful
of sand; you have to pick carefully and extract the finest silicon raw material from it.
Consider the quality of the completed product if the CPU is manufactured of the cheapest
and most abundant raw materials; can you still use such a high-performance
processor as it is now?
Metal is another significant component of a CPU, in addition to silicon. Aluminum has
largely replaced copper as the primary metal used in the manufacture of processor
internal parts. This is due to a number of factors. Aluminum has much better
electromigration characteristics than copper at the current CPU operating voltage. When
a significant number of electrons flow through a conductor, the atoms of the conductor
substance are hit by the electrons and leave their original places, leaving voids. This is
known as the electromigration problem. Staying in other positions will induce short
circuits in other places and disrupt the chip's logic function, rendering it useless.
Aside from these two basic elements, several chemical raw materials are also necessary
in the chip design process, and they serve various functions that will not be discussed
here.
3.2 A preparatory stage for CPU manufacturing
After the appropriate raw materials have been collected, some of these raw materials will
need to be preprocessed. Silicon processing is critical since it is the most significant raw
element. The silicon feedstock is first chemically purified, bringing it to a grade suitable
for usage in the semiconductor industry. These silicon raw materials must be shaped in

20
order to meet the processing requirements of integrated circuit production. Melting
silicon raw materials and injecting liquid silicon into massive high-temperature quartz
containers are used in this process.

Fig 3.2 CPU manufacturing


The raw material is then melted at a very high temperature. Many materials, including
silicon, have interior atoms with a crystalline structure, as we taught in middle school

21
chemistry class. The monolithic silicon raw material must be exceedingly pure and
monocrystalline in order to meet the criteria of high-performance CPUs. The silicon raw
material is then rotated stretched out of the high-temperature container, and a cylindrical
silicon ingot is formed at this point. The diameter of the circular cross-section of the
silicon ingot is 200 mm, according to the current technique.

Fig 3.3 cylindrical silicon


However, Intel and a few other businesses have begun to employ silicon ingots with a
diameter of 300 mm. It is tough to expand the cross-sectional area while maintaining the
various features of the silicon ingot, but it is still possible if the corporation is ready to
invest heavily in research. Intel spent around $3.5 billion to build a factory to design and
create 300mm silicon ingots, and the success of the new technology has allowed Intel to

22
produce more complicated and powerful integrated circuit circuits. The factory for
200mm ingots cost $1.5 billion as well.
The next step is to slice this cylindrical silicon ingot after it has been made and verified to
be an absolute cylinder. The thinner the slice, the less material you'll use and the more
processor chips you'll be able to make. Sections are also mirror-finished to provide a
perfectly smooth surface before being evaluated for deformation or other issues. This
step's quality inspection is very crucial because it directly affects the finished CPU's
quality.
The new slices are doped with chemicals to turn them become true semiconductors, and
transistor circuits representing various logic functions are then scribed on them. The
atoms of the doped substances penetrate the spaces between the silicon atoms, where they
interact through atomic force, giving the silicon raw material semiconductor properties.
CMOS processes are used in today's semiconductor manufacturing (Complementary
Metal Oxide Semiconductor).
The interaction between the N-type MOS transistor and the P-type MOS transistor in the
semiconductor is referred to as complementary. In electronic technology, N and P stand
for negative and positive electrodes, respectively. The slice is doped with chemicals to
generate a P-type substrate in most cases, and the logic circuit scribed on it is meant to
mimic the characteristics of an nMOS circuit. This form of transistor uses less space and
consumes less energy. At the same time, the emergence of pMOS transistors must be
limited as much as possible in most circumstances, because N-type material must be
implanted into the P-type substrate later in the manufacturing process, and this procedure
will result in the production of pMOS transistors.
The conventional sectioning is completed when the work of integrating the chemicals is
completed. The slices are then heated in a high-temperature furnace, with the heating
time-controlled to generate a silicon dioxide deposit on the slice's surface. The thickness
of the silicon dioxide layer can be regulated by closely monitoring temperature, air
composition, and heating time. The gate oxide width in Intel's 90-nanometer
manufacturing process is as thin as 5 atoms thick. The transistor gate circuit includes this
layer of gate circuit. The transistor gate circuit regulates the flow of electrons between

23
transistors. The flow of electrons is strictly controlled by regulating the gate voltage,
regardless of the voltage of the input and output ports.
Overlaying a photosensitive layer on top of the silicon dioxide layer is the final stage in
the preparation process. In the same layer, this layer of material is employed for different
control purposes. When dried, this layer of substance has an excellent photosensitive
effect, and following the photolithography process, it may be dissolved and removed
using chemical procedures.
3.3 Photolithography
In the present CPU manufacturing process, this is an extremely difficult stage. What
makes you say that? The photolithography technique involves carving appropriate
notches in the photosensitive layer with a certain wavelength of light, consequently
altering the chemical characteristics of the material. This technology places a high
demand on the wavelength of the light employed, necessitating the employment of UV
light with a short wavelength and lenses with a large curvature. Smudges on the wafer
can affect the etch process. Etching is a sophisticated and sensitive procedure with many
steps.
Each CPU requires more than 20 etch steps, and the amount of data required to design
each step of the process is measured in tens of gigabytes (one layer of etching per step).
Furthermore, if the etched drawings of each layer are enlarged many times, they may be
compared to a map of New York City and its suburbs, and they are considerably more
sophisticated. Imagine reducing the entire map of New York to just 100 square
millimeters in size. You can see how intricate the chip's structure is just by looking at it.

The wafer is switched over when all of these etch processes are completed. Through the
hollow notches on the quartz stencil, short-wavelength light shines on the photosensitive
layer of the wafer, and subsequently, the light and the stencil are removed. The material
from the exposed photosensitive layer is chemically removed, and silicon dioxide is
produced directly below the unoccupied location.
3.4 Doping

24
The silicon dioxide layer that fills the trenches and the exposed silicon layer underneath
this layer remains after the leftover photosensitive layer material is removed. After that,
another layer of silicon dioxide is applied. After that, a photosensitive layer is put to
another polysilicon layer. Another form of gate circuit is polysilicon. Polysilicon allows
gates to be formed before the transistor queue port voltages take effect because of the
metal substance employed (thus the phrase metal-oxide-semiconductor). Short-
wavelength light eroded the photosensitive layer through the mask as well. All of the
requisite circuits have been essentially constructed after another etching. After that, the
exposed silicon layer is chemically attacked with ions in order to form either an N-
channel or a P-channel. All of the transistors and their electrical connections are created
as a result of this doping process. Each transistor has two inputs and outputs, and the
space between them is known as a port
Fig 3.4 Doping process

25
Fig 3.5 Desing process
Ⅴ. Repeat the process
You'll keep adding layers after this, then a layer of silicon dioxide, and finally lithography
once. If you repeat these processes, you'll end up with a multi-layered stereoscopic
architecture, which is where your current processor is at. The technology of metal coating
film between each layer provides the conductive link between the layers. P4 CPUs have
seven layers of metal connections, whereas Athlon64 has nine layers. The number of
layers utilized is determined by the original layout design and has no bearing on the final
product's performance.
Ⅵ. Test, package test process
The wafers will be inspected one by one over the following few weeks, including
electrical characteristics testing to check if there are any logical problems and, if so, on

26
which layer they appeared, and so on. Following that, each defective chip unit on the
wafer will
be individually examined to see if it has any unique processing requirements, This the
made proccing process.

Chapter 4

PROCESSOR ARCHITECTUREP
Architecture: Modern processors are typically based on either x86 or ARM architectures.
x86 processors are commonly found in desktops and laptops, while ARM processors are
prevalent in smartphones, tablets, and embedded systems. Manufacturing Process:
Processors are manufactured using semiconductor fabrication processes. As of recent
years, they have been produced using advanced technologies like 7nm, 5nm, and even
smaller nodes, which enable greater performance and efficiency.
Core Count: Processors consist of multiple cores, each capable of executing tasks
independently. Multi-core processors are common nowadays, ranging from dual-core to
octa-core and beyond, allowing for parallel processing and improved multitasking.
Clock Speed: Clock speed, measured in GHz (gigahertz), determines how quickly a
processor can execute instructions. Higher clock speeds generally indicate better
performance, but other factors like architecture and core count also play significant roles.
Cache Memory: Processors contain cache memory, which is much faster than main
memory (RAM). This cache memory helps reduce the time it takes for the processor to
access frequently used data and instructions, thus enhancing performance.
Integrated Graphics: Many modern processors come with integrated graphics processing
units (GPUs). These GPUs are capable of handling graphics-related tasks without the
need for a separate graphics card, making them suitable for everyday computing and light
gaming.
Power Efficiency: With the increasing demand for mobile devices and energy-efficient
computing, modern processors are designed to strike a balance between performance and
power consumption. This is achieved through various techniques such as dynamic
voltage and frequency scaling (DVFS) and low-power states.

27
Instruction Set: Processors support various instruction sets, which determine the types of
operations they can perform. Common instruction sets include x86, ARM, and their
respective extensions, which provide support for multimedia, encryption, and other
specialized tasks.
Modern microprocessors are among the most complex systems ever created by humans.
A single silicon chip, roughly the size of a fingernail, can contain a complete high-
performance processor, large cache memories, and the logic required to interface it to
external devices. In terms of performance, the processors implemented on a single chip
today dwarf the room-sized supercomputers that cost over $10 million just 20 years ago.
Even the embedded processors found in everyday appliances such as cell phones,
personal digital assistants, and handheld game systems are far more powerful than the
early developers of computers ever envisioned.
Thus far, we have only viewed computer systems down to the level of machine-language
programs. We have seen that a processor must execute a sequence of instructions, where
each instruction performs some primitive operation, such as adding two numbers. An
instruction is encoded in binary form as a sequence of 1 or more bytes. The instructions
supported by a particular processor and their byte-level encodings are known as its
instruction-set architecture (ISA). Different “families” of processors, such as Intel IA32,
IBM/Freescale PowerPC, and the ARM processor family have different ISAs. A program
compiled for one type of machine will not run on another. On the other hand, there are
many different models of processors within a single family. Each manufacturer produces
processors of ever-growing performance and com- plexity, but the different models
remain compatible at the ISA level. Popular families, such as IA32, have processors
supplied by multiple manufacturers. Thus, the ISA provides a conceptual layer of
abstraction between compiler writers, who need only know what instructions are
permitted and how they are encoded, and processor designers, who must build machines
that execute those instructions. In this chapter, we take a brief look at the design of
processor hardware. We study the way a hardware system can execute the instructions of
a particular ISA. This view will give you a better understanding of how computers work
and the technological challenges faced by computer manufacturers. One important
concept is that the actual way a modern processor operates can be quite different from the

28
model of computation implied by the ISA. The ISA model would seem to imply
sequential instruction execution, where each instruction is fetched and executed to
completion before the next one begins. By executing different parts of multiple
instructions simultaneously, the processor can achieve higher performance than if it
executed just one instruction at a time. Special mechanisms are used to make sure the
processor computes the same results as it would with sequential execution. This idea of
using clever tricks to improve performance while maintaining the functionality of a
simpler and more abstract model is well known in computer science.
Examples include the use of caching in Web browsers and information retrieval data structures
such as balanced binary trees and hash tables. Chances are you will never design your own
processor. This is a task for experts working at fewer than 100 companies worldwide. Why, then,
should you learn about processor design? • It is intellectually interesting and important. There is
an intrinsic value in learning how things work. It is especially interesting to learn the inner
workings of a system that is such a part of the daily lives of computer scientists and engineers and
yet remains a mystery to many. Processor design embodies many of the principles of good
engineering practice. It requires creating a simple and regular structure to perform a complex
task.
• Understanding how the processor works aids in understanding how the overall computer system
works. In Chapter 6, we will look at the memory system and the techniques used to create an
image of a very large memory with a very fast access time. Seeing the processor side of the
processor-memory interface will make this presentation more complete.
• Although few people design processors, many design hardware systems that contain processors.
This has become commonplace as processors are embedded into real-world systems such as
automobiles and appliances. Embedded-system designers must understand how processors work,
because these systems are generally designed and programmed at a lower level of abstraction than
is the case for desktop systems.
• You just might work on a processor design. Although the number of companies producing
microprocessors is small, the design teams working on those processors are already large and
growing. There can be over 1000 people involved in the different aspects of a major processor
design.
4.1 Brief history of CPU Architecture
Let’s revisit where it all started. In 1945, the early beginnings of digital computing, most
computers were very large, slow, fragile.

29
ENIAC(Electronic Numerical Integrator and Computer) the first multipurpose digital
computer of 1946 covered approximately 168square meters of surface area, compare to
the size of a modern house. ENIAC was built using vacuum tube technology, which made
them huge and unreliable, these were program-controlled computers, an operator will
program the computer with switches and wires for each new calculation(Ah tedious right!
Well that was computing in 1946). These computers were general-purposed,
programming them was complicated and error-prone.
In the mid-1940’s John Von Neumann, proposed the stored-program computer model.
This model reimagined the general-purpose computer as three separate components:
Memory: A memory component for storing data and instructions
Central Processing Unit: for decoding and executing instructions
Inputs/Outputs: and a set of inputs and output interfaces.
The Von Neumann architecture also introduced the four-step cycle which includes:
fetching instructions from memory, decoding the instructions, executing the instructions,
and storing the results back to mem. The late 1940s saw the advancement of
semiconductor-based transistor technology which came to replace the old vacuum tube
technology design of most CPU designs Computing Abstraction Layers There are two
basic key concepts that are essential to comprehend how a CPU works; the binary
system, and computing abstraction layers

30
.
Fig 4.1 The Von Neumann CPU Architecture
Computers use a number system based on zeros and ones, for computers to function
critically, all information, data, and numbers must be represented in simple ON or OFF
states. The characters and numbers we use to communicate can easily be translated into
binary systems like the ASCII representation of characters of the alphabet.
Computational abstractions layers refer to how you can start up with very simple things
like atoms and transistors, and add an abstraction layer, an abstraction layer to build
things up to complex applications running in large data centers. At the foundation of
abstraction layers, we have atoms, which are combined in materials like silicon from
which we build simple transistors. These transistors act as switches that turn on or off
with the application of an electric current or voltage signal. By connecting numerous
switches together in a precise arrangement we form what we call the GATES, the
fundamental Boolean logic operators for performing calculations(AND, OR, NOT
GATES). We can now abstract ones and zeros to a language of logic that is more efficient
to understand, than the language of physics and flow of electrons. Using transistors as

31
switches and connecting the output of one to the input of another we can build a variety
of logic circuits or functional blocks. These functional blocks can take the form of adders,
multiplexors, flip-flops, latches, registers, counters, decoders, etc. Chaining functional
blocks together allows for even more complex logical functions.
With these complex logical functions, we can build custom execution units that perform

specific calculations. One of the most important execution units in a CPU is the ALU.
Designing a whole CPU comes down to building multiple specialized processing
elements and connecting them together in ways to permit complex computation to be
done. The combination of those elements into a system that can fetch instruction from
memory, decode instruction, execute those instructions and store those results back into
memory is simply referred to as a micro architecture or the system on a chip.
Fig 4.2 Computing Abstraction layers

32
Computing Abstraction Layers
The instruction set architecture (ISA) is a set of instructions that defines what kind of
operations can be performed on hardware. It is more like the language of the computer.
Just like English, French has dictionaries, that describe the words, the format, the
grammatical syntax, meaning, and pronunciation. ISA is an abstract model sometimes
referred to as computer architecture. The ISA defines the memory model, supported data
types, registers, the behavior of machine code i.e. the sequences of zeros and ones that
the CPU must execute. Several types of ISAs include x86, ARM(Advanced RISC
Machines), MIPS(Microprocessor without interlock pipeline stages). The ISA acts as a
bridge between hardware and software. On the software end, the ISA uses a compiler to
transform code written in a high-level language like C, Python, or Java into machine code
instructions that the CPU can execute. On the hardware end, when designing the CPU
microarchitecture, the ISA serves as a design specification that guides the engineer on
what set of instructions, datatypes, this CPU architecture is supposed to interpret and
execute.
The instructions in the ISA are implementation-independent, this means that if a different
company creates different microarchitecture designs they can all run the same code based
on the same ISA. Computer architects continue to evolve ISAs through extensions to the
instruction sets. These additional instructions are often created to perform certain
operations more efficiently, leveraging new processing elements in their
microarchitecture. These ISA extensions increase CPU performance by streamlining
operations for a particular arrangement of processing elements. Modern CPUs support
thousands of different operations, many of which are related to arithmetic
operations(addition, subtraction, division, multiplication), logical operations(AND, NOT,
OR), memory operations(loading, storing, moving), and flow control like branching. This
is a simplified explanation of ISAs, it is important to note that modern ISAs are much
more complex than I could hope to explain in this short article. ISAs are one of the most
critical parts of modern CPU design as they are the linchpin between hardware and
software that allows high-performance computation and seamless software experiences
across a variety of CPU microarchitectures.
4.2 Future trends on future

33
In the ever-evolving world of technology, processors are the beating heart of innovation.
These tiny chips are the brainpower behind our devices, enabling them to perform tasks
faster and more efficiently. As we look ahead, the future of processors promises exciting
developments that will shape the way we live, work, and play. In this blog, we'll explore
some of the emerging technologies and trends that are set to revolutionize processors in
the years to come.
Quantum Computing:
Quantum computing is no longer a concept from science fiction. It's a rapidly advancing
field that has the potential to revolutionize computation as we know it. Unlike classical
processors, which use bits, quantum processors use quantum bits or qubits. These qubits
can exist in multiple states simultaneously, enabling them to solve complex problems
exponentially faster than classical processors. While practical quantum computers are still
in their infancy, they hold immense promise in fields like cryptography, drug discovery,
and climate modeling.
AI Integration:
Artificial Intelligence (AI) is already a part of our daily lives, from virtual assistants to
recommendation algorithms. The future of processors will see greater integration of AI
capabilities directly into the hardware. This will lead to more efficient AI processing,
enabling devices to make decisions and adapt to user preferences in real time. It will also
open up new possibilities in areas such as autonomous vehicles and healthcare.
Neuromorphic Processors:
Inspired by the human brain, neuromorphic processors are designed to process
information in a way that mimics the brain's neural networks. These processors are highly
energy-efficient and excel at tasks like pattern recognition and sensory processing. In the
future, we can expect neuromorphic processors to play a crucial role in robotics,
prosthetics, and even brain-computer interfaces.
3D Stacking:
To keep up with the demand for increased processing power while maintaining energy
efficiency, processors are moving towards 3D stacking. Traditional processors are flat,
with components arranged in two dimensions. 3D stacking involves layering components

34
vertically, reducing the distance data needs to travel and improving performance. This
technology will pave the way for thinner and more powerful devices.
Edge Computing:
Edge computing is a paradigm shift from traditional cloud computing. It involves
processing data closer to the source, rather than sending it to distant data centers.
Processors in edge devices will need to be powerful and efficient to handle real-time
tasks. This trend is essential for applications like autonomous vehicles, smart cities, and
IoT devices.
Advanced Materials:
Breakthroughs in materials science are driving the development of processors made from
new materials. Graphene, for instance, has remarkable electrical properties that could
lead to incredibly fast and energy-efficient processors. Researchers are also exploring
other 2D materials and exotic materials like topological insulators for future processors.
Security Enhancements:
With the increasing complexity of technology, security is a growing concern. Future
processors will prioritize security features, including hardware-based encryption and
secure enclaves, to protect user data and privacy.
Energy Efficiency:
As our reliance on technology grows, so does the demand for energy-efficient processors.
Manufacturers are working on designs that maximize performance per watt, reducing the
environmental impact and the need for bulky cooling systems.

35
CONCLUSION

REFERENCES

36

You might also like