ReviewedCSC303 CompiledNote 2023 24
ReviewedCSC303 CompiledNote 2023 24
COURSE CONTENT
MODULE THREE
Intel 80X86 Programming Model. Registers – Types of Registers.
X86 Register set, Instruction types. Week9-11
MODULE FOUR
Numbering System. Computer Arithmetic. Conversion between radix.
Binary, Octal, Hexadecimal, Signed and Unsigned number representation
Ones and Twos Compliment. Week12-13
MODULE FIVE
Floating Point Arithmetic.
IEEE 754 Architecture, conversion standards and storage, Single, double precision and
counterparts, Programming Model
In-class test Week14
Revision Week 15
Examination
1
Lecture note on Computer Architecture and Organization CSC303
Course Reference Resources
2
Lecture note on Computer Architecture and Organization CSC303
MODULE ONE
The ancestors of the modern age computer were the mechanical and electromechanical
devices. These include:- Blaise Pascal’s machine, Difference Engine, Anal machine,
Difference Engine, Analytical Engine, ENIAC, EDSAC, EDVAC, UNIVAC, MARK I, II, III,
etc
Computer technology has made incredible improvement in the past half century. In the early
part of computer evolution, there were no stored-program computer, the computational
power was less and the sizes of the computers were very large. Nowadays, a personal
computer has more computational power, memory, disk storage, smaller in size and
available in affordable cost. This rapid improvement is as a result of advances in the
technology used to build computers and innovation in computer design.
The Von Neumann Machine. This is also referred to as the stored program computers.
Stored program computers have the following characteristics:
- Three hardware systems:
• A central processing unit (CPU)
• A main memory system
• An I/O system
-The capacity to carry out sequential instruction and sequential processing.
A single data path between the CPU and main memory. This single path is known as the Von
Neumann bottleneck.
For example to send data to the output device, the CPU places the device address on the
address bus, data on the data bus and enables the output device.
System Buses
Buses are wires connecting memory & I/O to microprocessor. 3 main types of Buses;
– Address Bus
• Unidirectional
• Identifying peripheral or memory location
– Data Bus
• Bidirectional
• Transferring data
– Control Bus
• Synchronization signals
• Timing signals
• Control signal
4
Lecture note on Computer Architecture and Organization CSC303
Computer Architecture and Computer Organization
Changes in technology not only influence organization but also result in the introduction
of more powerful and more complex architecture. However, because a computer
organization must be designed to implement a particular architectural specification, a
thorough treatment of organization requires a detailed examination of architecture as
well. Computer architecture comes before Computer organization.
Computer architecture and computer organization are related but distinct in concepts.
Computer Architecture refers to the design of the internal workings of a computer system,
including the CPU, memory, and other hardware components. It involves decisions about
the organization of the hardware, such as the instruction set architecture, the data path
design, and the control unit design.
Computer Architecture is concerned with optimizing the performance of a computer system
and ensuring that it can execute instructions quickly and efficiently.
On the other hand,
Computer Organization refers to the operational units and their interconnections that
implement the architecture specification. It deals with how the components of a computer
system are arranged and how they interact to perform the required operations.
Computer Organization is how operational attributes are linked together and contribute to
realizing the architectural specification, hence Computer Organization deals with a
structural relationship.
5
Lecture note on Computer Architecture and Organization CSC303
Summary difference between Computer Architecture and Computer Organization:
6
Lecture note on Computer Architecture and Organization CSC303
A computer system, like any system, consists of an interrelated set of components. The system
is best characterized in terms of structure, the way in which components are interconnected,
and function, the operation of the individual components. Furthermore, a computer’s
organization is hierarchical.
Each major component can be further described by decomposing it into its major
subcomponents and describing their structure and function.
Function
Both the structure and functioning of a computer are, in essence, simple. In general terms,
there are only four basic functions that a computer can perform:
• Data processing: Data may take a wide variety of forms, and the range of processing
requirements is broad.
7
Lecture note on Computer Architecture and Organization CSC303
• Data storage: Even if the computer is processing data on the fly (i.e., data come in and
get processed, and the results go out immediately), the computer must temporarily
store at least those pieces of data that are being worked on at any given moment. Thus,
there is at least a short- term data storage function. Equally important, the computer
performs a long- term data storage function. Files of data are stored on the computer
for subsequent retrieval and update.
• Data movement: The computer’s operating environment consists of devices that serve
as either sources or destinations of data. When data are received from or delivered to a
device that is directly connected to the computer, the process is known as input– output
(I/O), and the device is referred to as a peripheral. When data are moved over longer
distances, to or from a remote device, the process is known as data communications.
• Control: Within the computer, a control unit manages the computer’s resources and
orchestrates the performance of its functional parts in response to instructions.
Structure
There are four main structural components:
• Central processing unit (CPU): Controls the operation of the computer and performs
its data processing functions; often simply referred to as processor.
• Main memory: Stores data.
• I/O: Moves data between the computer and its external environment.
• System interconnection: Some mechanism that provides for communication among
CPU, main memory, and I/O. A common example of system interconnection is by
means of a system bus, consisting of a number of conducting wires to which all the
other components attach.
1. Bus
A bus is a bundle of wires grouped together to serve a single purpose. The main purpose of
the bus is to transfer data from one device to another. The processor's interface to the bus
includes connections used to pass data, connections to represent the address with which the
processor is interested, and control lines to manage and synchronize the transaction. The
three major buses are Data, Address and Control buses. There are internal buses that the
processor uses to move data, instructions, configuration, and status between its subsystems.
• The Data Bus provides a path for moving data among system modules. The data bus
may
8
Lecture note on Computer Architecture and Organization CSC303
consist of 32, 64, 128, or even more separate lines, the number of lines being referred
to as the width of the data bus. Because each line can carry only 1 bit at a time, the
number of lines determines how many bits can be transferred at a time. The width of
the data bus is a key factor in determining overall system performance. A narrower
bus width means that it will take more time to communicate a quantity of data as
compared to a wider bus. For example, if the data bus is 32 bits wide and each
instruction is 64 bits long, then the processor must access the memory module twice
during each instruction cycle.
• The Address Bus is used to designate the source or destination of the data on the data
bus. For example, if the processor wishes to read a word (8, 16, or 32 bits) of data from
memory, it puts the address of the desired word on the address lines. Clearly, the
width of the address bus determines the maximum possible memory capacity of
the system. Address space refers to the maximum amount of memory and I/O that a
microprocessor can directly address.
If a microprocessor has a 16-bit address bus, it can address up to 216 = 65,536 bytes.
Therefore, it has a 64 kB address space. i.e.
1byte = 8 bits….
1024bytes =>1kB
65,536bytes=>64kB
Furthermore, the address lines are generally also used to address I/O ports. Note that the
address bus is unidirectional (the microprocessor asserts requested addresses to the various
devices), and the data bus is bidirectional (the microprocessor asserts data on a write and the
devices assert data on reads).
The Control Bus is used to control the access to and the use of the data and address
lines.
Because the data and address lines are shared by all components, there must be a
means of controlling their use. Control signals transmit both command and timing
information among system modules. Timing signals indicate the validity of data and
address information. Command signals specify operations to be performed. Typical
control lines include:
• Memory write: Causes data on the bus to be written into the addressed location
• Memory read: Causes data from the addressed location to be placed on the bus
• I/O write: Causes data on the bus to be output to the addressed I/O port
• I/O read: Causes data from the addressed I/O port to be placed on the bus
• Transfer ACK: Indicates that data have been accepted from or placed on the bus
• Bus request: Indicates that a module needs to gain control of the bus
9
Lecture note on Computer Architecture and Organization CSC303
• Bus grant: Indicates that a requesting module has been granted control of the bus
• Interrupt request: Indicates that an interrupt is pending
• Interrupt ACK: Acknowledges that the pending interrupt has been recognized
• Clock: Is used to synchronize operations
• Reset: Initializes all modules
2. Registers
Registers are temporary storage locations in the CPU. A register stores a binary value using
a group of latches. Although variables and pointers used in a program are all stored in
memory, they are moved to registers during periods in which they are the focus of operation.
This is so that they can be manipulated quickly. Once the processor shifts its focus, it stores
the values it doesn't need any longer back in memory. Registers may be used for several
operations. Discussion on types and usage of registers will follow in Module III of this
document.
3. Buffers
A processor does not operate in isolation. Typically there are multiple processors
supporting the operation of the main processor. These include video processors, the
keyboard and mouse interface processor, and the processors providing data from hard
drives and CDROMs. There are also processors to control communication interfaces
such as USB, and Ethernet networks. These processors all operate independently, and
therefore one may finish an operation before a second processor is ready to receive the
results.
If one processor is faster than another or if one processor is tied up with a process
prohibiting it from receiving data from a second process, then there needs to be a
mechanism in place so that data is not lost. This mechanism takes the form of a block of
memory that can hold data until it is ready to be picked up. This block of memory is called
a buffer. The figure 3 below presents the basic block diagram of a system that incorporates
a buffer.
Instead of passing data to processor B, Processor B reads data in buffer, processor A stores
data from the buffer
"m e m ory queue"
Processor Processor
A B
10
Lecture note on Computer Architecture and Organization CSC303
The concept of buffers is presented here because the internal structure of a
processor often relies on buffers to store data while waiting for an external device
to become available.
4. The Stack
During the course of normal operation, there will be a number of times when the
processor needs to use a temporary memory, a place where it can store a number for a
while until it is ready to use it again.
For example, every processor has a finite number of registers. If an application needs
more registers than are available, the register values that are not needed immediately can
be stored in this temporary memory. When a processor needs to jump to a subroutine or
function, it needs to remember the instruction it jumped from so that it can pick back up
where it left off when the subroutine is completed. The return address is stored in this
temporary memory. The stack is a block of memory locations reserved to function as
temporary memory. It operates much like the stack of plates at the start of a restaurant
buffet line. When a plate is put on top of an existing stack of plates, the plate that was on
top is now hidden, one position lower in the stack. It is not accessible until the top plate is
removed. There are two main operations that the processor can perform on the stack: it
can either store the value of a register to the top of the stack or remove the top piece of
data from the stack and place it in a register. Storing data to the stack is referred to as
"pushing" while removing the top piece of data is called "popping". The LIFO nature of
the stack makes it so that applications must remove data items in the opposite order from
which they were placed on the stack. For example, assume that a processor needs to store
values from registers A, B, and C onto the stack. If it pushes register A first, B second, and
C last, then to restore the registers it must pull in order C, then B, then A. This is illustrated
in Figure 4a and 4b.
Re g ister A: 25
To p of stack
74 after pushes
Re g ister B: 83
83
Re g ister C: 74 25
To p of stack
befo re p ush es
11
Lecture note on Computer Architecture and Organization CSC303
Assume registers A, B, and C of a processor contain 25, 83, and 74 respectively. If the
processor pushes them onto the stack in the order A, then B, then C then pulls them off the
stack in the order B, then A, then C, what values do the registers contain afterwards? The
solution is explained as follows. First, let's see what the stack looks like after the values from
registers A, B, and C have been pushed. The data from register A is pushed first placing it at
the bottom of the stack of three data items. B is pushed next followed by C which sits at the
top of the stack. In the stack, there is no reference identifying which register each piece of
data came from.
Re gister A: 83
To p of stack
74 b efo re pu lls
Re gister B: 74
83
Re gister C: 25
25
To p of stack
after pulls
When the values are pulled from the stack, B is pulled first and it receives the value from
the top of the stack, i.e., 74. Next, A is pulled. Since the 74 was removed and placed in B,
A gets the next piece of data, 83. Last, 25 is placed in register C.
5. I/O Ports
Input/output ports or I/O ports refer to any connections that exist between the
processor and its external devices. A USB printer or scanner, for example, is connected to
the computer system through an I/O port. The computer can issue commands and send
data to be printed through this port or receive the device's status or scanned images. Some
I/O devices are connected directly to the memory bus and act just like memory devices.
Sending data to the port is done by storing data to a memory address and retrieving data
from the port is done by reading from a memory address.
If the device is incorporated into the processor, then communication with the port is done by
reading and writing to registers. This is sometimes the case for simple serial and parallel
interfaces such as a printer port or keyboard and mouse interface.
12
Lecture note on Computer Architecture and Organization CSC303
PROCESSOR DESIGN APPROACH
One of the key features used to categorize a microprocessor is whether it supports reduced
instruction set computing (RISC) or complex instruction set computing (CISC). The distinction
is how complex individual instructions are the arrangement that exist for the same basic
instruction. In practical terms, this distinction directly relates to the complexity of a
microprocessor’s instruction decoding logic; a more complex instruction set requires more
complex decoding logic. The differences are tabulated in Table 1.
CISC RISC
Instructions and addressing modes are Simple instruction decode logic since there
complex hence complex instruction decode are few instructions to decode hence few
logic operand complexity
In a single instruction, many operations are Has separate instruction for each set of
embedded.eg. fetch, add, increment, store operation, hence reduce complexity by
operations all in one instruction speeding up instructions that are frequently
used
Not all instructions in CISC microprocessors The instructions that are not frequently used
are used with the same frequency. Only are removed so as to simplify the
some (core set) are called most of the time microprocessor control logic hence system
can perform faster, faster execution of
programs, leading to improved throughput
for the commonly used instruction and
increase overall performance.
The instructions that are used less often Reduces permutation of the decode logic
impose a burden on the entire system since instructions are reduced and only few
because there is increase in permutation of memory R/W operations
decode logic in a given clock cycle
13
Lecture note on Computer Architecture and Organization CSC303
Data creation is growing exponentially due to explosion in big data and machine learning, both
processor, storage and memory technology has witness fundamental change in terms of size,
speed, capacity and architecture, hence the demand for graphics processing units. GPUs are ideal
fit for so many modern applications. A Central Processing Unit (CPU) is a latency -optimized
general purpose processor that is designed to handle a wide range of distinct tasks sequentially,
while a Graphics Processing Unit (GPU) is a throughput-optimized specialized processor
designed for high-end parallel computing, as illustrated in Figure 5.
CPU Architecture
A Central Processing Unit (CPU) is the brains of your computer. The main job of the CPU is to
carry out a diverse set of instructions through the fetch-decode-execute cycle to manage all parts
of your computer and run all kinds of computer programs.
A CPU is very fast at processing your data in sequence, as it has few heavyweight cores with high
clock speed. It’s like a Swiss army knife that can handle diverse tasks pretty well. The CPU is
latency-optimized and can switch between numbers of tasks real quick, which may create an
impression of parallelism. Nevertheless, fundamentally it is designed to run one task at a time.
GPU Architecture
A Graphics Processing Unit (GPU) is a specialized processor whose job is to rapidly manipulate
memory and accelerate the computer for a number of specific tasks that require a high degree of
parallelism.
As the GPU uses thousands of lightweight cores whose instruction sets are optimized for
dimensional matrix arithmetic and floating point calculations, it is extremely fast with linear
algebra and similar tasks that require a high degree of parallelism.
As a rule of thumb, if your algorithm accepts vectorized data, the job is probably well-suited
for GPU computing.
Architecturally, GPU’s internal memory has a wide interface with a point-to-point connection
which accelerates memory throughput and increases the amount of data the GPU can work with
in a given moment. It is designed to rapidly manipulate huge chunks of data all at once.
Lecture note on Computer Architecture and Organization CSC303
CPU GPU
Performs fewer instructions per clock Performs more instructions per clock
A GPU cannot replace a CPU in a computer system. The CPU is necessary to oversee the
execution of tasks on the system. However, the CPU can delegate specific repetitive workloads
to the GPU and free its own resources necessary for maintaining the stability of the system and
the programs that are running
GPU uses many lightweight processing cores, leverages data parallelism, and has high memory
throughput. While the specific components will vary by model, fundamentally most modern
GPUs use single instruction multiple data (SIMD) stream architecture.
FLYNN’S TAXONOMY
Two data streams with two possible methods to process them leads to the 4 different categories
in Flynn’s Taxonomy. Let’s take a look at each, as illustrated in Figure 6
SISD stream is an architecture where a single instruction stream (e.g. a program) executes on one
data stream. This architecture is used in older computers with a single-core processor, as well as
many simple compute devices.
A SIMD stream architecture has a single control processor and instruction memory, so only one
instruction can be run at any given point in time. That single instruction is copied and ran across
each core at the same time. This is possible because each processor has its own dedicated memory
which allows for parallelism at the data-level (a.k.a. “data parallelism”).
The fundamental advantage of SIMD is that data parallelism allows it to execute computations
quickly (multiple processors doing the same thing) and efficiently (only one instruction unit).
MISD stream architecture is effectively the reverse of SIMD architecture. With MISD multiple
instructions are performed on the same data stream. The use cases for MISD are very limited
today. Most practical applications are better addressed by one of the other architectures.
MIMD stream architecture offers parallelism for both data and instruction streams. With MIMD,
multiple processors execute instruction streams independently against different data streams.
Now that we understand the different architectures, let’s consider why SIMD is the best choice
for GPUs. The answer becomes intuitive when you understand that fundamentally graphics
processing and many other common GPU computing use cases are simply running the same
mathematical function over and over again at scale. In this case, many processors running the
same instruction on multiple data sets is ideal.
So where does SIMT fit into Flynn’s Taxonomy? SIMT can be viewed as an extension of SIMD. It
adds multithreading to SIMD which improves efficiency as there is less instruction fetching
overhead.
Lecture note on Computer Architecture and Organization CSC303
Terminologies for Future Trends in Computer Architecture
These trends are actively being researched and developed by scientists, engineers, and
tech companies around the world.
While some trends, such as quantum computing, are still in the experimental stage,
others like in-memory computing and reconfigurable architecture are already making
their way into practical applications to drive transformative changes across various
industries. Quantum computing could revolutionize fields like cryptography and drug
discovery, while neuromorphic architecture could lead to breakthroughs in artificial
intelligence. In-memory computing could accelerate data-driven insights, and photonic
computing might reshape communication networks. Reconfigurable architecture could
optimize computing resources for different tasks, improving overall efficiency.
1. Quantum computing
Besides, the number of potential states and interactions multiplies exponentially as the
complexity of the problem rises. Although it is still in its initial phase, quantum
computing has the potential to change industries, including cryptography, banking,
and drug discovery. Building a quantum computer can be done in several ways, such
as using topological qubits, trapped ions, and superconducting circuits.
2. Neuromorphic architecture
Neuromorphic computing is motivated by the structure and operation of the human brain. It
processes information in a way that is fundamentally distinct from conventional computing by
using specialised hardware and software to replicate the brain’s neuronal structure. For instance,
neuromorphic computing relies on analogue rather than digital computations, it may be more
energy-efficient. Because it can learn from and adjust to new information in real-time, it can also
be more versatile and adaptive. Several computing fields, such as artificial intelligence, robotics,
and sensory processing, stand to benefit from it.
3. In-memory computing
4. Reconfigurable architecture
Reconfigurable architecture is a computer architecture combining some of the
flexibility of software with the high performance of hardware.
6. Edge computing
This is a distributed computing paradigm that processes data at the network’s edge,
nearer to the data source. Edge computing enables data to be processed and analysed
locally, on devices or systems closer to the source of data generation, rather than
transferring all of the data to a centralised data center or cloud for processing. This
method is frequently applied to decrease latency and speed up data processing.
CPU Pipelining
Microprocessor designers, in an attempt to squeeze every last bit of performance from their
designs, try to make sure that every circuit of the CPU is doing something productive at all times.
The most common application of this practice applies to the execution of instructions. It is
based on the fact that there are steps to the execution of an instruction, each of which uses
entirely different components of the CPU.
Assuming that the execution of a machine code instruction can be broken into three stages:
• Fetch – get the next instruction to execute from its location in memory
• Decode – determine which circuits to energize in order to execute the fetched instruction
• Execute – use the ALU and the processor to memory interface to execute the instruction
By comparing the definitions of the different components of the CPU shown with the needs of
these three different stages or cycles, it can be seen that three different circuits are used for these
three tasks.
• The internal data bus and the instruction pointer perform the fetch.
• The instruction decoder performs the decode cycle.
• The ALU and CPU registers are responsible for the execute cycle.
Lecture note on Computer Architecture and Organization CSC303
Once the logic that controls the internal data bus is done fetching the current instruction,
what's to keep it from fetching the next instruction? It may have to guess what the next
instruction is, but if it guesses right, then a new instruction will be available to the instruction
decoder immediately after it finishes decoding the previous one.
Once the instruction decoder has finished telling the ALU what to do to execute the current
instruction, what's to keep it from decoding the next instruction while it's waiting for the ALU
to finish? If the internal data bus logic guessed right about what the next instruction is, then the
ALU won't have to wait for a fetch and subsequent decode in order to execute the next
instruction.
This process of creating a queue of fetched, decoded, and executed instructions is called
pipelining, and it is a common method for improving the performance of a processor.
Therefore, a fast processor can be built by making the rate of execution of instruction fast. This
can be achieved by increasing the number of instructions that can be executed
simultaneously. Some CPUs break the fetch-decode execute cycle down into smaller steps,
where some of these smaller steps can be performed in parallel. This overlapping speeds up
execution. i.e. The CPU fetches and executes simultaneously. This method, used by all current
CPUs, is known as pipelining. This is a process whereby the CPU fetches and executes at the
same time, achieved by splitting the microprocessor into two; (1) bus interface unit (BIU) and (2)
execution unit (EU). It is a way of improving the processing power of the CPU. The BIU access
the memory and peripherals while the EU executes instructions. The idea of pipelining is to
have more than one instruction being processed by the processor at the same time. Figure 7a
shows the time-line sequence of the execution of five instructions on a non-pipelined processor.
Notice how a full fetch - decode-execute cycle must be performed on instruction 1 before
instruction 2 can be fetched. This sequential execution of instructions allows for a very simple
CPU hardware, but it leaves each portion of the CPU idle for 2 out of every 3 cycles. During the
fetch cycle, the instruction decoder and ALU are idle; during the decode cycle, the bus interface
and the ALU are idle; and during the execute cycle, the bus interface and the instruction decoder
are idle.
Lecture note on Computer Architecture and Organization CSC303
Figure 7b on the other hand shows the time-line sequence for the execution of five
instructions using a pipelined processor. Once the bus interface has fetched
instruction 1 and passed it to the instruction decoder for decoding, it can begin its
fetch of instruction 2.
Notice that the first cycle in the figure only has the fetch operation. The second
cycle has both the fetch and the decode cycle happening at the same time. By the
third cycle, all three operations are happening in parallel.
1. Resource Hazards. When an instruction is storing a value to memory, and another value is
being fetched from the memory, both need access to memory, this result in a conflict. This
occurs when two or more instructions that are already in the pipeline need the same
resources. It can also occur when multiple instructions are ready to enter the execute phase,
and there exist only a single ALU. This can be taken care of in 2 ways. (1) Instruction execution
will continue while instruction fetch will wait. (2) providing more resources such as multiple
ports into main memory and multiple ALU
2. Data Hazards. This happens when the result of one instruction, not yet available is to be used
as an operand for a following instruction. This is a situation when there is a conflict in the
access of an operand location i.e. two or more instructions accessing a particular register or
memory operand {NB in sequential processing, this is not a problem but in parallel, the values
will be different}. This can be resolved by altering the flow of execution in a program i.e.
specialized hardware can be used to detect the conflict and route data through special paths
that exists between various stages in the pipeline, thereby reducing the time needed for the
instruction to access the required operand.
Lecture note on Computer Architecture and Organization CSC303
3. Control Hazard: This occurs when the pipeline makes the wrong decision on a branch
prediction, and brings a wrong instruction into the pipe. A conditional branch instruction
makes the address of the next instruction to be fetched unknown. After a conditional
branch, predicting the instruction that will be needed next becomes a problem. This may be
overcome by (i) rearranging the machine code to cause a delayed branch. (ii) Fetching the
beginning and branch instruction at the same time and save the branch until it is actually
needed of which at that time the true execution path will be known.
The first Microprocessor (4004) was designed by Intel Corporation which was founded by Moore
and Noyce in 1968.
In the early years, Intel focused on developing semiconductor memories (DRAMs and EPROMs)
for digital computers.
In 1969, a Japanese Calculator manufacturer, Busicom approached Intel with a design for a small
calculator which need 12 custom chips. Ted Hoff, an Intel Engineer thought that a general
purpose logic device could replace the multiple components.
This idea led to the development of the first so called microprocessor. So, Microprocessors
started with a modest beginning of drivers for calculators.
With developments in integration technology Intel was able to integrate the additional chips like
8224 clock generator and the 8228 system controller along with 8080 microprocessor within a
single chip and released the 8 bit microprocessor 8085 in the year 1976.
The 8085microprocessor consisted of 6500 MOS transistors and could work at clock frequencies
of 3-5MHz. The other improved 8 bit microprocessors include Motorola MC 6809, Zilog Z-80 and
RCA COSMAC.
Lecture note on Computer Architecture and Organization CSC303
In 1978, Intel introduced the 16 bit microprocessor 8086 and 8088 in 1979. IBM selected the Intel
8088 for their personal computer (IBM-PC). 8086 microprocessor made up of 29,000 MOS
transistors and could work at a clock speed of 5-10 MHz. It has a 16-bit ALU with 16-bit data bus
and 20-bit address bus. It can address up to 1MB of address space.
The pipelining concept was used for the first time to improve the speed of the processor. It had
a pre-fetch queue of 6 instructions where in the instructions to be executed were fetched during
the execution of an instruction. It means 8086 architecture supports parallel processing.
The 8088 microprocessor is similar to 8086 processor in architecture, but the basic difference is it
has only 8-bit data bus even though the ALU is of 16-bit.
In 1982 Intel released another 16-bit processor called 80186 designed by a team under the
leadership of Dave Stamm. This is having higher reliability and faster operational speed but at a
lower cost. It had a pre-fetch queue of 6-instructions and it is suitable for high volume
applications such as computer workstations, word-processor and personal computers.
It is made up of 134,000 MOS transistors and could work at clock rates of 4 - 6 MHz.
Intel released another 16 bit microprocessor 80286 having 1, 34,000 transistors in 1981. It was
used as CPU in PC-ATs in 1982. It is the second generation microprocessor, more advanced to
80186 processor. It could run at clock speeds of 6 to 12.5 MHz. It has a 16-bit data bus and 24bit
address bus, so that it can address up to 16MB of address space and 1GB of virtual memory.
Intel introduced the concept of protected mode and virtual mode to ensure proper operation. It
also had on-chip memory management unit (MMU).This was popularly called as Intel 286 in
those days.
In 1985, Intel released the first 32 bit processor 80386, with 275,000 transistors. It has 32-bit data
bus and 32-bit address bus so that it can address up to a total of 4GB memory also a virtual
memory space of 64TB. It could process five million instructions per second and could work with
all popular operating systems including Windows. It is incorporated with a concept called
paging in addition to segmentation technique. It uses a math co-processor called 80387.
Intel introduced 80486 microprocessor with a built-in maths co-processor and with 1.2 million
transistors. It could run at the clock speed of 50 MHz. This is also a 32 bit processor but it is twice
as fast as 80386.The additional features in 486 processor are the built-in Cache and built-in math
co-processors. The address bus here is bidirectional because of presence of cache memory.
On 19th October, 1992, Intel released the Pentium-I Processor with 3.1 million transistors. So, the
Pentium began as fifth generation of the Intel x86 architecture. This Pentium was backward
Lecture note on Computer Architecture and Organization CSC303
compatible while offering new features. The revolutionary technology is that the CPU is able to
execute two instructions at the same time. This is known as super scalar technology. The Pentium
uses a 32-bit expansion bus, however the data bus is 64 bits.
The 7.5 million transistors based chip, Intel Pentium II processor was released in 1997. It works
at a clock speed of 300M.Hz. Pentium II uses the Dynamic Execution Technology which consists
of three different facilities namely, Multiple branch prediction, Data flow analysis, and
Speculative execution unit. Another important feature is a thermal sensor located on the mother
board which monitor the die temperature of the processor.
Intel Celeron Processors, the Pentium-III processor with 9.5 million transistors was introduced in
1999. It uses dynamic execution micro-architecture, a unique combination of multiple branch
prediction, dataflow analysis and speculative execution.
The Pentium III has improved MMX and processor serial number feature. The improved MMX
enables advanced imaging, 3D streaming audio and video, and speech recognition and enhanced
Internet facility.
Pentium-IV with 42 million transistors and 1.5 GHz clock speed was released by Intel in
November 2000. The Pentium -IV processor has a system bus with 3.2 G-bytes per second of
bandwidth. This high bandwidth is a key reason for applications that stream data from memory.
This bandwidth is achieved with 64 –bit wide bus capable of transferring data at a rate of
400MHz. The Pentium -IV processor enables real-time MPEG2 video encoding and near real-time
MPEG4 encoding, allowing efficient video editing and video conferencing.
Intel with partner Hewlett-Packard developed the next generation 64-bit processor architecture
called IA-64.This first implementation was named Itanium. Itanium processor which is the first
in a family of 64 bit products was introduced in the year 2001.The Itanium processor was
specially designed to provide a very high level of parallel processing, to enable high performance
without requiring very high clock frequencies. The Itanium processor can handle up to 6
simultaneous 64 –bit instructions per clock cycle.
The Itanium II is an IA-64 microprocessor developed jointly by Hewlett-Packard (HP) and Intel
and released on July 8, 2002. It is theoretically capable of performing nearly 8 times more work
per clock cycle than other CISC and RISC architectures due to its parallel computing micro-
architecture.
Pentium 4EE was released by Intel in the year 2003 and Pentium 4E was released in the year 2004.
The Pentium Dual-Core brand was used for mainstream X86-architecture microprocessors from
Intel from 2006 to 2009. The 64 bit Intel Core2 was released on July 27, 2006. In terms of features,
Lecture note on Computer Architecture and Organization CSC303
price and performance at a given clock frequency, Pentium Dual Core processors were positioned
above Celeron but below Core and Core 2 microprocessors in Intel's product range.
The Pentium Dual Core, which consists of 167 million transistors was released on January 21,
2007. Intel Core Duo consists of two cores on one die, a 2 MB L2 cache shared by both cores, and
an arbiter bus that controls L2 cache.
Core 2 Quad processors are multi-chip modules consisting of two dies similar to those used in
Core 2 Duo, forming a quad-core processor.
In September 2009, new Core i7 models based on the Lynnfield desktop quad-core processor and
the Clarksfield quad-core mobile were added, The first six-core processor in the Core lineup is
the Gulftown, which was launched on March 16, 2010. Both the regular Core i7 and the Extreme
Edition are advertised as five stars in the Intel Processor Rating.
– It is a 16-bit microprocessor.
– 8086 has a 20 bit address bus can access up to 220 memory locations (1 MB).
– It can support up to 64K I/O ports.
– It provides 14, 16 -bit registers.
– It has multiplexed address and data bus AD0- AD15 and A16 – A19.
– It requires single phase clock with 33% duty cycle to provide internal timing.
– 8086 is designed to operate in two modes, Minimum and Maximum.
– It can prefetches up to 6 instruction bytes from memory and put them in instr queue in order
to speed up instruction execution.
– It requires +5V power supply.
– A 40 pin dual in line package.
8086 employs parallel processing. The 8086 has 2 parts which operates at the same time; the bus
interface unit BIU and execution unit EU as seen in the Figure 8 below
Lecture note on Computer Architecture and Organization CSC303
– The BIU performs all bus operations such as instruction fetching, reading and writing
operands for memory and calculating the addresses of the memory operands.
– The instruction bytes are transferred to the instruction queue.
– It provides a full 16 bit bidirectional data bus and 20 bit address bus.
– The bus interface unit is responsible for performing all external bus operations.
Specifically it has the following functions:
– Instruction fetch , Instruction queuing, Operand fetch and storage, Address calculation
relocation and Bus control.
– The BIU uses a mechanism known as an instruction queue to implement a pipeline architecture.
The BIU contains the following registers:
The BIU fetches instructions using the CS and IP, written CS:IP, to contract the 20-bit address.
Data is fetched using a segment register (usually the DS) and an effective address (EA) computed
by the EU depending on the addressing mode.
– The EU extracts instructions from the top of the queue in the BIU, decodes them, generates
operands if necessary, passes them to the BIU and requests it to perform the read or write by
cycles to memory or I/O and perform the operation specified by the instruction on the operands.
– During the execution of the instruction, the EU tests the status and control flags and updates
them based on the results of executing the instruction.
– If the queue is empty, the EU waits for the next instruction byte to be fetched and shifted to top
of the queue.
– When the EU executes a branch or jump instruction, it transfers control to a location
corresponding to another set of sequential instructions.
– Whenever this happens, the BIU automatically resets the queue and then begins to fetch
instructions from this new location to refill the queue.
Lecture note on Computer Architecture and Organization CSC303
MODULE TWO
An instruction set architecture (ISA), is the part of the computer architecture related to
programming, including the native data types, instructions, registers, addressing modes,
memory architecture, interrupt and exception handling, and external I/O. The ISA also includes
a specification of the set of opcodes (machine language) - the native commands for a particular
processor. ISA is the hardware – software interface
An instruction set is a list of all the instructions that a processor can execute.
ISA is at the interface between software and hardware. It is an abstraction which hides
hardware complexity from software through a set of operations and devices. One of the
crucial features of any processor is its instruction set, i.e. the set of machine code
instructions that the processor can carry out. Each processor has its own unique instruction
set specifically designed to make best use of the capabilities of that processor. The actual
number of instructions provided ranges from a few dozen for a simple 8-bit
microprocessor to several hundred for a 32-bit (virtual address extension) VAX processor.
However, it should be pointed out that a large instruction set does not necessarily imply a
more powerful processor.
Instruction set architecture (ISA) describes the processor in terms of what the
programmer sees, i.e. the instructions and registers. Two machines may have the same
ISA, but different organizations. Organization is concerned with the internal design of
the processor, the design of the bus system and its interfaces, the design of memory and
so on. Two machines with the same organization may have different hardware
implementations.
iv. A reference for the next instruction, to be fetched and executed. The next
instruction which is to be executed is normally the next instruction following the current
instruction in the memory. Therefore, no explicit reference to the next instruction is
provided.
Where are those operands located? In the memory or in the CPU registers or in the I/O
device.
If the operands are located in the registers then an instruction can be executed faster than
that of the operands located in the memory. The main reason here is that memory access
time is higher in comparison to the register access time.
i. Data Processing Instructions: These instructions are used for arithmetical and logic
operations in a machine. Examples of data processing instructions are: Arithmetic,
Boolean, shift, character and string processing instructions, stack and register,
manipulation instructions, vector instructions, etc.
ii. Data Storage/Retrieval Instructions: Since the data processing operations are
normally performed on the data stored in CPU registers, we need instructions to bring
data to and from memory to registers. These are called data storage/retrieval instructions.
Examples of data storage and retrieval instructions are load and store instructions.
iii. Data Movement Instructions: These are basically input/output instructions. They
are required to bring in programs and data from various devices to memory or to
communicate the results to the input/output devices. Some of these instructions can be:
start, halt, test etc.
Lecture note on Computer Architecture and Organization CSC303
iv. Control Instructions: These instructions are used for testing the status of
computation through Processor Status Word (PSW). Branch instruction.
Instruction set design is the most complex yet interesting and very much analyzed aspect
of computer design. The instruction set plays an important role in the design of the CPU
as it defines many functions of it. Since instruction sets are the means by which a
programmer can control the CPU, therefore, users’ views must be considered while
designing the instruction set.
iii. What should be the instruction format? This includes issues like:
- instruction length,
- number of address,
iv. What is the number of registers which can be referenced by an instruction and how are they
used?
These are:
Addresses: Addresses are treated as a form of data which is used in the calculation of
actual physical memory address of an operand. In most of the cases, the addresses
provided in instruction are operand references and not the actual physical memory
addresses.
Numbers: All machines provide numeric data types. One special feature of numbers used
in computers is that they are limited in magnitude, and hence the underflow and overflow
may occur during arithmetical operations on these numbers. The maximum and
minimum magnitude is fixed for an integer number while a limit of precision of numbers
and exponent exist in the floating point numbers. The three numeric data types which are
common in computers are:
Characters: Another very common data type is the character or string of characters. The
most widely used character representation is ASCII(American National Standard Code of
Information Interchange). It has 7 bits for coding data pattern which implies 128 different
characters.
Some of these characters are control characters which may be used in data
communication. The eighth bit of ASCII may be used as a parity bit. One special mention
about ASCII which facilitates the conversion of a 7 bit ASCII and a 4 bit packed decimal
number is that the last four digits of ASCII number are binary equivalents of digits 0 -9.
Lecture note on Computer Architecture and Organization CSC303
Logical Data: In general a data word or any other addressable unit such as byte, half word
etc. are treated as a single unit of data. But can we consider an n-bit data unit consisting
of n items of 1 bit each? If we treat each bit of an n-bit data as an item then it can be
considered to be logical data. Each of these n items can have a value 0 or 1. What are the
advantages of such a bit oriented view of data? The advantages of such a view will be:
Instruction Format
Therefore, any instruction issued by the processor must carry at least two types of
information.
These are the operation to be performed, encoded in what is called the op-code field, and
the address information of the operand on which the operation is to be performed,
encoded in what is called the address field.
i. three-address,
ii. two-address,
iii. one-and-half-address,
iv. one-address, and
v. zero-address.
The original contents of register R2 are lost due to this operation while the original
contents of register R1 remain intact.
A similar instruction that uses memory locations instead of registers can take the form
ADD A, B. In this case, the contents of memory location A are added to the contents of
memory location B and the result is used to override the original contents of memory
location B.
In this case the instruction implicitly refers to a register, called the Accumulator Racc ,
such that the contents of the accumulator is added to the contents of the register R1 and
the results are stored back into the accumulator Racc
Lecture note on Computer Architecture and Organization CSC303
If a memory location is used instead of a register, then an instruction of the form ADD B
is used. In this case, the instruction adds the content of the accumulator Racc to the
content of memory location B and stores the result back into the accumulator Racc
Between the two- and the one-address instruction, there can be a one-and-half
address instruction.
Consider, for example, the instruction ADD B, R1. In this case, the instruction adds the
contents of register R to the contents of memory location B and stores the result in register
R1
Owing to the fact that the instruction uses two types of addressing, that is, a register and
a memory location, it is called a one-and-half-address instruction. This is because register
addressing needs a smaller number of bits than those needed by memory addressing.
zero-address instructions.
These are the instructions that use stack operation. A stack is a data organization
mechanism in which the last data item stored is the first data item retrieved. Two specific
operations can be performed on a stack. These are the push and the pop operations. A
special register called stack pointer (SP), is used to indicate the stack location that can be
addressed. The classes of instruction is summarized in the table below.
The main memory can be modeled as an array of millions of adjacent cells, each
capable of storing a binary digit (bit), having value of 1 or 0. These cells are organized in
the form of groups of fixed number, say n, of cells that can be dealt with as an entity.
An entity consisting of 8 bits is called a byte. This address will be used to determine the
location in the memory in which a given word is to be stored. This is called a memory
WRITE operation. Similarly, the address will be used to determine the memory location
from which a word is to be retrieved from the memory. This is called a memory READ
operation.
During a memory write operation a word is stored into a memory location whose address
is specified. During a memory read operation a word is read from a memory location
whose address is specified. Typically, memory read and memory write operations are
performed by the central processing unit (CPU). The 3 basic steps needed in order for the
CPU to perform a write operation into a specified memory location:
1. The word to be stored into the memory location is first loaded by the CPU into a
specified register, called the memory data register (MDR).
2.The address of the location into which the word is to be stored is loaded by
the CPU into a specified register, called the memory address register (MAR).
3. WRITE signal is issued by the CPU indicating that the word stored in the MDR is to
be stored in the memory location whose address in loaded in the MAR.
Similar to the write operation, three basic steps are needed in order to perform a memory
read
operation:
1. The address of the location from which the word is to be read is loaded into the
MAR.
2. A READ signal issued by the CPU indicating that the word whose address is in
the MAR is to be read into the MDR.
Lecture note on Computer Architecture and Organization CSC303
3. After some time, corresponding to the memory delay in reading the specified
word, the required word will be loaded by the memory into the MDR ready for
use by the CPU.
FETCH-EXECUTE CYCLE
Fetch and Execute are the fundamental operations of the processor. The fetch-
decode execute cycle represents the steps that a computer follows to run a program. The
program which is to be executed is a set of instructions that is stored in the memory, hence,
the CPU executes the instructions that it finds in the computer’s memory. In order to
execute an instruction;
- the CPU must first fetch (transfer) the instruction from memory into one of its
registers. -the CPU then decodes the instruction, i.e. it decides which instruction
has been fetched and -finally it executes (carries out) the instruction.
The CPU then repeats this procedure, i.e. it fetches an instruction, decodes and executes
it. This process is repeated continuously and is known as the fetch-execute cycle.
This cycle begins when the processor is switched on and continues until t he CPU is halted
(via a halt instruction, e.g. 8086 HLT instruction or the machine is switched off). The fetch -
execute cycle operates by first fetching an instruction.
Instruction Fetch
An instruction fetch involves the reading of an instruction from the memory location(s)
to the CPU. The execution of this instruction may involve several operations, depending
on the nature of the instruction. The processing needed for a single instruction (fetch and
execution) is referred to as an instruction cycle
- The Program Counter (PC) keeps track of the instruction that is to be executed next
after the execution of an on-going instruction. i.e. PC always contains the address
of the next instruction to be executed. A program counter is used for a fetch cycle
in a typical CPU.
Lecture note on Computer Architecture and Organization CSC303
- The instructions are loaded into the Instruction Register (IR), before their
execution. i.e. the IR holds the instruction to be executed.
Instruction Execution
The instruction execution takes place in the CPU registers. The following are CPU
registers:
• Memory Address Register (MAR): It specifies the address of the memory location
from which the data or instruction is to be accessed (for a read operation) or to
which the data is to be stored (for a written operation).
• Program Counter (PC): It keeps track of the instruction that is to be executed next,
after the execution of an on-going instruction.
• Instruction Register (IR): the instructions are loaded here before their execution.
Lecture note on Computer Architecture and Organization CSC303
Table 4 (a) to (d) shows the evolution and how microprocessors have grown faster and
much more complex.
Lecture note on Computer Architecture and Organization CSC303
It is worthwhile to list some of the highlights of the evolution of the Intel product line:
- 8080: The world’s first general- purpose microprocessor. This was an 8-bit machine,
with an 8-bit data path to memory. The 8080 was used in the first personal computer, the
Altair.
- 8086: A far more powerful, 16-bit machine. In addition to a wider data path and
larger registers, the 8086 sported an instruction cache, or queue, that prefetches a few
instructions before they are executed. A variant of this processor, the 8088, was used in
IBM’s first personal computer, securing the success of Intel.
-80286: This extension of the 8086 enabled addressing a 16-MB memory instead of just 1
MB.
Lecture note on Computer Architecture and Organization CSC303
-80386: Intel’s first 32-bit machine, and a major overhaul of the product. With a 32-bit
architecture, the 80386 rivaled the complexity and power of minicomputers and
mainframes introduced just a few years earlier. This was the first Intel processor to support
multitasking, meaning it could run multiple programs at the same time.
-80486: The 80486 introduced the use of much more sophisticated and powerful cache
technology and sophisticated instruction pipelining. The 80486 also offered a built -in math
coprocessor, offloading complex math operations from the main CPU.
-Pentium: With the Pentium, Intel introduced the use of superscalar techniques, which
allow multiple instructions to execute in parallel.
Pentium Pro: The Pentium Pro continued the move into superscalar
organization begun with the Pentium, with aggressive use of register
renaming, branch prediction, data flow analysis, and speculative
execution.
-Pentium II: The Pentium II incorporated Intel MMX technology, which is designed
specifically to process video, audio, and graphics data efficiently.
-Pentium III: The Pentium III incorporates additional floating point instructions: The
Streaming SIMD Extensions (SSE) instruction set extension added 70 new instructions
designed to increase performance when exactly the same operations are to be performed
on multiple data objects. Typical applications are digital signal processing and graphics
processing.
-Pentium 4: The Pentium 4 includes additional floating point and other enhancements for
multimedia.
-Core: This is the first Intel x86 microprocessor with a dual core, referring to the
implementation of two cores on a single chip.
- Core 2: The Core 2 extends the Core architecture to 64 bits. The Core 2 Quad provides
four cores on a single chip. More recent Core offerings have up to 10 cores per chip. An
important addition to the architecture was the Advanced Vector Extensions instruction
Lecture note on Computer Architecture and Organization CSC303
set that provided a set of 256-bit, and then 512 bit, instructions for efficient processing of
vector data.
Although the organization and technology of the x86 machines have changed dramatically
over the decades, the instruction set architecture has evolved t o remain backward
compatible with earlier versions. Thus, any program written on an older version of the x86
architecture can execute on newer versions.
Lecture note on Computer Architecture and Organization CSC303
MODULE THREE
Registers are extremely fast memory locations within the CPU that are used to create and
store the results of CPU operations and other calculations. Computers differ in register
sets, number of registers, register types, and the length of each register. They also differ in
the usage of each register.
General-purpose registers can be used for multiple purposes and assigned to a variety of
functions by the programmer.
Special-purpose registers are restricted to only specific functions. In some cases, some
registers are used only to hold data and cannot be used in the calculations of operand
addresses.
• The 8086 has the following groups of the user accessible internal registers. These are
- Instruction pointer(IP)
Lecture note on Computer Architecture and Organization CSC303
_ Four General purpose registers(AX,BX,CX,DX)
_ Four pointer (SP,BP,SI,DI)
_ Four segment registers (CS,DS,SS,ES)
_ Flag Register(FR)
• The 8086 has a total of fourteen 16-bit registers including a 16 bit register called the
status register (flag register), with 9 of bits implemented for status and control flags.
Segment Registers
1) Code segment (CS) is a 16-bit register containing address of 64 KB segment with
processor instructions. The processor uses CS segment for all accesses to instructions
referenced by instruction pointer (IP) register.
2) Stack segment (SS) is a 16-bit register containing address of 64KB segment with
program stack. By default, the processor assumes that all data referenced by the stack
pointer (SP) and base pointer (BP) registers is located in the stack segment. SS register can
be changed directly using POP instruction.
3) Data and Extra segment (DS and ES) is a 16-bit register containing address of 64KB
segment with program data. By default, the processor assumes that all data referenced by
general registers (AX, BX, CX, and DX) and index register (SI, DI) is located in the data
and Extra segment.
Data Registers
1) AX (Accumulator)
• It consists of two 8-bit registers AL and AH, which can be combined together and used
as a 16-bit register AX. AL in this case contains the low order byte of the word, and AH
contains the high-order byte.
*Accumulator can be used for I/O operations and string manipulation.
2) BX (Base register)
• It is consists of two 8-bit registers BL and BH, which can be combined together and
used as a 16-bit register BX. BL in this case contains the low order byte of the word, and
BH contains the high-order byte.
• BX register usually contains a offset for data segment.
Lecture note on Computer Architecture and Organization CSC303
3) CX (Count register)
• It is consists of two 8-bit registers CL and CH, which can be combined together and
used as a 16-bit register CX. When combined, CL register contains the low-order byte of
the word, and CH contains the high-order byte.
• Count register can be used in Loop, shift/rotate instructions and as a counter in string
manipulation.
4) DX (Data register)
• It is consists of two 8-bit registers DL and DH, which can be combined together and
used as a 16-bit register DX. When combined, DL register contains the low-order byte of
the word, and DH contains the high-order byte.
• DX can be used as a port number in I/O operations.
• In integer 32-bit multiply and divide instruction the DX register contains high-order
word of the initial or resulting number.
Pointer register
1. Stack Pointer (SP) is a 16-bit register is used to hold the offset address for stack segment.
2. Base Pointer (BP) is a 16-bit register is used to hold the offset address for stack segment.
i. BP register is usually used for based, based indexed or register indirect
addressing.
ii. The difference between SP and BP is that the SP is used internally to store the
address in case of interrupt and the CALL instruction.
3. Source Index (SI) and Destination Index (DI). These two 16 -bit register is used to hold
the offset address for DS and ES in case of string manipulation instruction.
i. SI is used for indexed, based indexed and register indirect addressing, as well as a
source data addresses in string manipulation instructions.
ii. DI is used for indexed, based indexed and register indirect addressing, as well as
a destination data addresses in string manipulation instructions.
Lecture note on Computer Architecture and Organization CSC303
Instruction Pointer (IP)
It is a 16-bit register. It acts as a program counter and is used to hold the offset address
for CS.
Flag Register
A flag is a 16-bit register containing 9 one bit flags. i.e. the Flag Register is addressable by
bit as shown in the figure below. Each of the bit depicts a status flag of the
microprocessor.
_This flag is set when there is a carry out of MSB in case of addition or, a borrow in case of
subtraction. For example. When two numbers are added, a carry may be generated out of the
most significant bit position. The carry flag, in this case, will be set to 1’. In case, no carry is
generated, it will be ‘0.
Some registers are used in memory references. Two registers are essential in memory write and
read operations: the memory data register (MDR) and memory address register (MAR). The MDR
and MAR are used exclusively by the CPU and are not directly accessible to programmers
Two main registers are involved in fetching an instruction for execution: the program counter
(PC) and the instruction register (IR). The PC is the register that contains the address of the next
instruction to be fetched. The fetched instruction is loaded in the IR for execution. After a
successful instruction fetch, the PC is updated to point to the next instruction to be executed .
In the case of a branch operation, the PC is updated to point to the branch target instruction after
the branch is resolved, that is, the target address is known.
Condition Registers
Condition registers, or flags, are used to maintain status information. Some architectures
contain a special program status word (PSW) register. The PSW contains bits that are set by the
CPU to indicate the current status of an executing program. These indicators are typically for
arithmetic operations, interrupts, memory protection information, or processor status.
Lecture note on Computer Architecture and Organization CSC303
- Index Register, used in index addressing. The address of the operand is obtained by
adding a constant to the content of a register, called the index register. The index register
holds an address displacement. Index addressing is indicated in the instruction by
including the
name of the index register in parentheses and using the symbol X to indicate the constant to
be added.
- Segment Pointers used in order to support segmentation, the address issued by the
processor should consist of a segment number (base) and a displacement (or an offset)
within the segment. A segment register holds the address of the base of the segment.
- Stack Pointer. A stack is a data organization mechanism in which the last data item stored
is the first data item retrieved. Two specific operations can be performed on a stack. These
are the Push and the Pop operations. The stack pointer (SP) is used to indicate the stack
location that can be addressed. In the stack push operation, the SP value is used to indicate
the location (called the top of the stack).
ADDRESSING MODE
This refers to the different ways in which operands can be addressed. Addressing mode differ in
the way the address information of operands is specified. The basic addressing modes are;
i. IMMEDIATE ADDRESSING
- The operand is given explicitly as part of the instruction. No memory access is required. Also
operand could follow immediately after the instruction. According to this addressing mode, the
value of the operand is (immediately) available in the instruction itself. For example, the case of
loading the decimal value 9000 into a register Ri. This operation can be performed using an
instruction LOAD 9000, Ri.
In this instruction, the operation to be performed is to load a value into a register. The
source operand is (immediately) given as 9000, and the destination is the register Ri.
Lecture note on Computer Architecture and Organization CSC303
ii. DIRECT (ABSOLUTE) ADRESSING
- The address of operand is given explicitly as part of the instruction. According to this
addressing mode, the address of the memory location that holds the operand is included in the
instruction.
For example, the case of loading the value of the operand stored in memory location 5000 into
register Ri. This operation can be performed using an instruction;
In this instruction, the source operand is the value stored in the memory location whose address
is 5000 , and the destination is the register DH.
- The effective address of the operand is in the register or main memory location whose address
appears in the instruction. The name of a register or a memory location that holds the (effective)
address of the operand is included in the instruction. In order to indicate the use of indirection
in the instruction, it is customary to include the name of the register or the memory location in
parentheses. For example, the
This instruction moves the content of address indicated by SI into CL. i.e. address of operand is
held in register SI.
- The effective address (EA) of the operand is generated by adding an index register value (X) to
the direct address (DA)
i.e. EA = X + DA
In this addressing mode, the address of the operand is obtained by adding a constant to the
content of a register, called the index register. For example, the instruction
This instruction loads register AX with the contents of the memory location whose address is the
sum of the contents of register DI and the value 5. Index addressing is indicated in the instruction
by including the name of the index register in parentheses and symbol X indicate the constant
to be added.
Lecture note on Computer Architecture and Organization CSC303
v. BASE RELATIVE ADDRESSING
The effective address of the operand is generated by adding a constant to the direct address
indicated in the instruction. Hence, the address of the operand is obtained by adding a constant
to the content of a base register indicated in parenthesis. For example, the instruction
The Intel processors architecture are 80x86/Pentium, where x ≥ 3, processors. Its features
include:
Increased data bus from 16bits to 32 bits
IA-32 processors are 32 bit integrated processors that can operate on integer and floating
point data
It is backward compatible with 16 bit 8086 in real mode
IA-32 operates in real mode by default, hence it has to be switched to protected mode
Pentium II processors, as a family of IA-32, support MMX, i.e. multimedia data structure
which is SIMD (single instruction multiple data) in nature
The Intel 80x86 extends the 4 general purpose registers, the pointer register and the index
registers to 32 bits. It adds 2 extra segment registers FS and GS
8 of 32 bit Registers (General Purpose Registers, GPRs): EAX, EBX, ECX, EDX, EBP, ESP,
ESI, EDI;
All the general registers (16 bit/8bit) of 8086 (AX, BX,CX, DX, BP, SP, SI,DI, AH, AL,
BH, BL, CH, CL. DH, DL) and 16 bit IP, and 16 bit Flags ;
IA-32 increased address bus from 16 bits to 32 bits so that the physical addressable
memory is
The extended registers can all be used as pointer with DS register as the offset unlike 8086
where only SI, DI, BP, SP can be used as pointer.
Byte, Word, Double Word, Single precision floating point, Double precision floating point,
54
Lecture note on Computer Architecture and Organization CSC303
(i) Immediate addressing (ii) Register addressing (iii) Direct addressing (v) Register Indirect
(iv) Relative base addressing (v) Relative Index addressing (vi) Based Index addressing
▪ Memory address of the operand is pointed to by the register contents of either a base register
(BX, BP), an index register (SI, DI), or any of the general purpose 32 bit registers (EAX, EBX, ECX,
EDX, EBP, ESI, EDI)
55
Lecture note on Computer Architecture and Organization CSC303
-is a register indirect addressing where a combination of (base + index) registers is used as an
operand memory address pointer.
- any pair of the general purpose 32 bit registers (EAX, EBX, ECX, EBP, EDI, ESI) can be used -
the first register is the base and the second is the index regardless of any of the 32 bit general
-Same as register relative addressing except that displacement is added to a pair of base + index
register to form the effective memory address of the operand. Any pair of general purpose
registers + displacement can be used. e.g
-the second general purpose register is scaled by a factor of 1, 2, 4, or 8 and added to another
general purpose register to form a memory pointer for the operand.
-e.g
56
Lecture note on Computer Architecture and Organization CSC303
Scaled index addressing mode allows easy access to multidimensional arrays. In this addressing
mode any of the 32bit registers except ESP can be used as a pointer which is multiplied by a
scaled factor as stated above corresponding to byte, word, doubleword and quadword operands.
Note that only the 32 bit register can be used for scale addressing mode. 16 bit register cannot
be used as a scaled index.
Example: Find the effective address in each of the following cases. Assume that ESI =
200h, ECX =100h, EBX = 50h and EDI = 100h.
1. MOV AX, [2000 + ESI *4] 3. MOV ECX, [2400 + EBX *4]
2. MOV AX, [5000 + ECX *2] 4. MOV DX, [100 + EDI*8]
Solution:
1. 2000h + 200h x 4 = 2000h + 800h = 2800h. Therefore the address of the operand moved into
AX is DS:2800h
The flag bits affected by the ADD instruction are carry flag (CF), parity flag (PF), auxiliary carry
flag (AF), zero flag (ZF), sign flag (SF) and overflow flag (OF).
CF- This flag is set whenever there is a carry out either from d7 after an 8bit operation or from
d15 after a 16bit data operation.
PF – After certain operations, the parity of the result’s low order byte is checked. If the byte has
an even number of 1’s, the parity flag is set to 1, otherwise it is cleared i.e. 0. Parity is checked for
the lower 8 bits only in a 16 bit operation
AF – If there is a carry from d3 to d4 of an operation, this bit is set, otherwise it is cleared.
ZF – Is set to 1 if the result of an arithmetic or logical operation is zero, otherwise, it is cleared. SF
– the binary representation of signed numbers uses the most significant bit as the sign bit. After
57
Lecture note on Computer Architecture and Organization CSC303
arithmetic or logic operations, the status of this sign bit is copied into the SF, thereby indicating
the sign of the result.
OF – is set whenever the result of a signed number operation is too large, causing the high order
bit to overflow into the sign bit.
Examples:-Show how the flag register is affected by the addition of 38h and 2Fh in the following
lines of code. MOV BH, 38h ; ADD BH, 2Fh
38h 0011 1000
2Fh 00101111
67h 01100111
58
Lecture note on Computer Architecture and Organization CSC303
3. How would the status flags be set after the processor performed
the 8-bit addition of 101101012 and 100101102 ?
59
Lecture note on Computer Architecture and Organization CSC303
Exercise:
Logical Instructions
- AND destination, Source
E.g. MOV BL, 35h
AND BL, 0Fh
35h 0011 0101
0Fh
05h
- OR destination, source
OR AX, DA68h
Logical Shift – Right and Left. E.g. show the result of SHR in the flowing instructions.
MODULE FOUR
Numbering Systems
A single transistor, can only remember one of two possible numbers, a one or a zero. This
isn't useful for anything more complex than controlling a light bulb, so for larger values,
transistors are grouped together so that their combination of ones and zeros can be used to
represent larger numbers.
Some of the methods that are used to represent numbers with groups of transistors or bits are
hereby discussed in number system.
A number system uses a specific radix (base). Radices that are power of 2 are widely used in digital
systems. These radices include binary (base 2), quaternary (base 4), octagonal (base 8), and
hexagonal (base 16). The base 2 binary system is dominant in computer systems.
Binary notation directly represents digital logic states. Hexadecimal numbering system is a
convenient means of representing binary numbers because 1 hexadecimal digit represents four
binary digits. The letters A-F are borrowed for use as hexadecimal digits beyond 9.
The table below shows the relationship between base 2, 10 and 16. Note that in each base,
when one more is added to the highest digit, that digit becomes zero and a 1 is carried to the
next highest digit position. i.e. when all the 3 columns are completed, a carry remains which is
pushed to a fourth column.
Increasing power of 2 forms the weight of the numbers. i.e. 2n where n=0,1……N. The first
few values of the decimal, binary and hexadecimal equivalent is shown in Table 5.
Lecture note on Computer Architecture and Organization CSC303
Conversion from decimal to binary and from binary to decimal can be done using the weight
that is associated with each binary bit position. Examples;
Lecture note on Computer Architecture and Organization CSC303
1. 19110 = 101111112 can be converted to hexadecimal by grouping the 8 bits into two nibbles
and representing each nibble with a single hexadecimal digit. i.e. (1011) (1111)
1 1001
16 + 8 +1 = 25. Therefore 2510 = 110012 .
………32 16 8 4 2 1
1 0 0 11 1
...............16 8 4 2 1
11001
16+8+1= 2510.
5. Convert 29Bh to binary.
2 9 B
0010 1001 1011 put together and drop the leading zeroes
10100110112 .
Group in nibbles, pack with zeroes where needed and represent each nibble with a
hexadecimal digit. 1111 0100 0100
F 4 4
= F44 h
Lecture note on Computer Architecture and Organization CSC303
2/16 = 0 r 2 msd
2 x 160 = 2 x1 =2
11 x 161 = 11 x 16 = 176
6 x 162 = 6 x 256 = 1536
2 + 176 + 1536 = 171410 .
Binary Decimal
1101 13
1001 9
2 10 .
Lecture note on Computer Architecture and Organization CSC303
10. Add 23D9h + 94BEh. Start from the lsd. If result of addition is less
than 16, write it as result for that position, else subtract 16 from the
result of the addition, write the remainder as answer and carry 1 to
the next digit.
23 D9h
+ 94BEh B897
h
59Fh
- 2B8 h
2 E7h
Binary Complements
In decimal arithmetic, every number has an additive
complement, i.e., a value that when added to the original
number results in a zero. For example, 5 and -5 are additive
complements because 5 + (-5) = 0. This section describes the
two primary methods used to calculate the complements of a
binary value.
One's Complement
When asked to come up with a pattern of ones and zeros that
when added to a binary value would result in zero, most people
respond with, "just flip each bit in the original value." This
"inverting" of each bit, substituting 1's for all of the 0's and 0's for
all of the 1's, results in the 1's complement of the original value.
1 001011 0
+ 0110100 11 11111
1 1
If the two values were additive complements, the result should
be zero, right? Well, that takes us to the 2's complement.
Two's Complement
The result of adding an n-bit number to its one's complement
is always an n-bit number with ones in every position. If we add
1 to that result, our new value is an n-bit number with zeros in
every position and an overflow or carry to the next highest
position, the (n+1)th column which corresponding to 2n. For our 8-
bit example above, the result of adding 10010110 2 to 011010012 is
111111112 . Adding 1 to
MODULE FIVE:
e.t.c are time consuming for the CPU and the programmers.
Example
Convert 0.07812510 to short real
1. 0.07812510 = 0.0001012
Lecture note on Computer Architecture and Organization CSC303
2. 0.0001012 = 1.01 x 2 -4
3. bias Exponent = 7F + - 4
= 7F - 4 = 7Bh
= 011110112
4. Sign bit 0
5. Mantissa, Suppress 1 before Significant and post pad with 0s
to make 23 bit
0100 0000 0000 0000 0000 000
6. Put all together, 1 signbit + 8bias + 23mantissa
Exponent bits bits.
0011 1101 1010 0000 0000 0000 0000 0000 0000 0000
0000
3 D A 0 0 0 0 0h
Example
Convert -96.27 to single precision floating point format
Solution
- decimal 96.27 = binary 1100000.01000101000111101
= 1.10000001000101000111101 E 6
OR
1.10000001000101000111101 x 10+6
- Sign bit is 1 since the Constant value is negative
- Exponent bit is added to bias (7F)
= 7F + 6 = 85h
Lecture note on Computer Architecture and Organization CSC303
= 10000101
- Mantissa, obtained by suppressing the 1 before the
Significant. i.e. copy the xxx value and zero pad to 23 bits
= 10000001000101000111101
Put all together to make 32 bits
1 signbit + 8 exponent bias + 23 mantissa bits
OR
1.00110000011E7
- Sign bit is 0 since Positive Constant value
= 3 FF + 7
= 406h
Lecture note on Computer Architecture and Organization CSC303
= 100000001102
- Mantissa = 0011 0000 0110 0000 0000 0000 0000
Example
Solution
- decimal 7.525 = binary 0111.100001100
Example:
Convert 97.7812510 to temporary real.
Solution
decimal 97.78125 = binary 1100001.110012
Scientific binary format = 1.10000111001E6
-Sign bit is 0
-Exponent bias = 3FFF + 6
= 4005h = 1000000000001012
-Mantissa = 110000111001 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000
0000 0000 0000
Put all together to obtain 80 bits word length
= 0100 0000 0000 0101 1100 0011 1001
0000 0000 0000
4005 C 3900………………. 0
Lecture note on Computer Architecture and Organization CSC303
Practice Exercise
1. Convert to Short real, long real and temporary real
a. -347.62510
b. -30.62510
c. 345.275
4 .The Stack is accessed via a stack index pointing to the top of the
stack.
5 .The stack relative to the TOS is
called St (i), i = 0,1,……7 i.e.
St (o) = TOS
Lecture note on Computer Architecture and Organization CSC303
Solution
St (i) = R((i+j) mod 8) Given that st(0) = R5
St (1) = R((1+5) mod 8) = R6
St (2) = R((2+5) mod 8) = R7
St (3) = R((3+5) mod 8) = R0
St (4) = R((4+5) mod 8) = R1
St (5) = R((5+5) mod 8) = R2
St (6) = R((6+5) mod 8) = R3
St (7) = R((7+5) mod 8) = R4
7. The data format for the data registers is temporary real with 80
bits word Length, 64 bits mantissa, 15 bit exponent and 1 bit to
represent the Sign of the data
Lecture note on Computer Architecture and Organization CSC303
Data types
8089 Supports 7 data types
- Data are stored in temporary real storage format hence they are
converted to temporary real before storage on the stack.
- The status register is used as flag for reporting various states of the
coprocessor. It is 16 bits and addressable by bit as in 80x86.
- Tag register is used to depict the state of the data register. It uses 2
bits for each data register.
Lecture note on Computer Architecture and Organization CSC303
8087 Addressing mode
1 . Register Addressing
The registers R0 - R1 are the only ones allowed to hold operands. The
use of 8086 registers is not allowed.
- All instructions are preceded with F
E.g. FDIV st(1), st(3)