Module 2 Basic Processing Unit
Module 2 Basic Processing Unit
The CPU is the brain of the computer, responsible for executing instructions and
performing calculations. It is made up of several key components:
• Arithmetic and Logic Unit (ALU): Performs arithmetic and logical operations
like addition, subtraction, and comparisons.
• Registers: Small, high-speed storage locations within the CPU used to store
intermediate data and instructions.
• Program Counter (PC): A register that holds the address of the next instruction
to be executed.
2. Memory:
Memory stores data and instructions that the CPU needs to perform tasks. It can be
categorized into:
• Primary Memory (RAM): Random Access Memory (RAM) is used for temporarily
storing data and instructions that the CPU needs quickly.
• Cache Memory: A small, fast type of memory that stores frequently accessed
data to speed up access times between the CPU and RAM.
3. Buses:
• Address Bus: Carries the address to which data is being sent or retrieved.
• Control Bus: Carries control signals that manage the operations of the computer.
Input and output systems manage the communication between the computer and
external devices like keyboards, monitors, printers, and network interfaces.
• Input Devices: Hardware that allows the computer to receive data (e.g.,
keyboard, mouse).
• Output Devices: Hardware that allows the computer to present data (e.g.,
monitor, printer).
• I/O Controllers: Manage the interaction between the CPU and input/output
devices, ensuring data is transferred correctly.
ISA defines the set of instructions that a CPU can understand and execute. It specifies
the operations available, how operands are accessed, and how results are stored. Key
aspects of ISA include:
• Addressing Modes: The ways in which memory locations can be specified within
instructions (e.g., direct, indirect, indexed).
• Machine Language: The binary code that the CPU directly executes. It consists
of sequences of 0s and 1s that represent operations and data.
The Control Unit is responsible for directing the operation of the processor by fetching
instructions from memory, decoding them, and executing them. It uses a set of control
signals to manage the execution of operations across different components.
8. Pipelining:
The clock provides the timing signals that synchronize the operation of all components
in the computer system. Each cycle of the clock allows the CPU to execute an instruction
or perform a part of an operation.
10. Interrupts:
Interrupts are signals that temporarily halt the normal execution flow of the program to
handle a special event, such as input from an I/O device. The CPU saves its state,
processes the interrupt, and then resumes the previous task. Interrupts enable
multitasking and real-time responses in systems.
The bus is a system of pathways used for communication between components (CPU,
memory, I/O devices). It typically includes:
• Single Bus Architecture: One bus shared by the CPU, memory, and I/O
devices.
• Multiple Bus Architecture: Separate buses for different types of
communication, such as memory and I/O buses.
Modern computers use a hierarchy of memory systems to balance speed and cost:
• Main Memory (RAM): Larger but slower than cache, used to store data and
instructions for active processes.
• Secondary Storage: Non-volatile storage like hard drives or SSDs, used for
long-term data storage.
• Character Encoding: Methods like ASCII and Unicode for representing text data
in binary form.
• The system bus connects major components like the CPU, memory, and I/O
devices. The bus controller manages access to the bus and ensures that data is
transmitted in a synchronized manner.
Virtual memory is a memory management technique that gives the illusion of a larger
amount of memory than physically available by swapping data between RAM and disk
storage. This enables efficient multitasking and the execution of larger programs than
the system's physical memory alone would allow.
The execution of a complete instruction in a computer system involves several steps that
occur in a sequence. These steps are generally part of the fetch-decode-execute cycle
(also known as the instruction cycle), which describes how a computer processes
instructions from start to finish.
Here’s a breakdown of the typical steps involved in the execution of a complete
instruction:
1. Fetch:
• Steps:
o The Program Counter (PC) holds the address of the next instruction to
be executed.
o The Control Unit (CU) retrieves the instruction from memory at the
address specified by the PC.
o After fetching, the Program Counter (PC) is updated to point to the next
instruction (usually by incrementing it, though in some cases it might
jump to another address due to branches or jumps).
• Example:
o If the PC holds address 1000, the instruction at that address (let's say an
ADD operation) is fetched from memory and stored in the IR.
2. Decode:
• Steps:
o The Addressing Mode is also decoded to figure out how operands are
accessed (e.g., immediate, direct, indirect).
• Example:
o If the instruction is ADD R1, R2, R3, the opcode (ADD) tells the processor
to add the contents of registers R2 and R3, and store the result in register
R1.
• Steps:
o If the operands are not already in registers, they are fetched from
memory.
o If the instruction involves an immediate operand (e.g., ADD R1, #5), the
operand can be extracted directly from the instruction itself.
o This step may involve accessing the memory or registers to get the
required data.
• Example:
o If the instruction ADD R1, R2, #10 uses an immediate operand (10), the
operand is extracted directly from the instruction and does not require
fetching from memory.
4. Execute:
• Steps:
o The Arithmetic and Logic Unit (ALU) or other functional units perform
the required operation (e.g., arithmetic, logical, or data transfer
operations).
• Example:
o For an ADD instruction (ADD R1, R2, R3), the ALU will add the contents of
R2 and R3 and store the result in R1.
• Steps:
o The memory address is calculated, and data is either read from or written
to the memory.
• Example:
o If the instruction is STORE R1, 500, the contents of register R1 are written
to memory location 500.
6. Writeback:
• Steps:
o After the execution step, the result of the operation is written back to the
appropriate destination.
o If the instruction modifies a register (e.g., ADD), the result will be stored
in a register.
o If the instruction modifies memory (e.g., STORE), the result will be written
to memory.
• Example:
• Steps:
• Example:
Multiple Bus Organization is a type of computer system architecture that uses more
than one bus to manage the communication between different components like the CPU,
memory, and I/O devices. In contrast to a single bus organization, where all
components share the same bus for data transfer, multiple bus systems allow for parallel
communication, which can improve performance by reducing contention for bus access.
o Address Bus: Used to specify memory addresses for read and write
operations.
o Control Bus: Carries control signals to manage data transfer (e.g., read,
write, interrupt).
o Parallelism: By using separate buses for different functions (e.g., one for
memory, one for I/O), operations can be performed simultaneously
without interference. For example, the CPU can access memory while also
communicating with I/O devices.
4. Bus Control: In a multiple bus system, the bus controller ensures that the
buses are used efficiently. It controls the arbitration process, which is the method
by which the system decides which component gets access to the bus. This
prevents bus conflicts and ensures proper synchronization between components.
In addition to arbitration, the controller also generates control signals to specify which
operation should be performed on the bus (such as reading from or writing to memory,
or transferring data to an I/O device).
o In a two-bus system:
o In a three-bus system:
▪ Bus 1 (Memory Bus): Used for data transfer between the CPU
and memory.
▪ Bus 2 (I/O Bus): Used for communication between the CPU and
I/O devices.
• The CPU needs to fetch an instruction from memory and also transfer data to
an I/O device.
• Bus 1 (Memory Bus) can be used to fetch data from memory, while Bus 2 (I/O
Bus) can be used to send data to an I/O device.
• At the same time, Bus 3 (General Bus) can be used for control signals, ensuring
everything operates in sync.
This parallel execution of operations significantly improves system efficiency and overall
performance compared to a single-bus system, where the CPU would have to wait for
the data transfer to memory or I/O devices to complete before performing another task.
1. Complexity:
o The design and implementation of multiple buses are more complex than a
single-bus system. Multiple controllers, more sophisticated synchronization
mechanisms, and routing logic are required to ensure that each bus
functions properly and efficiently.
2. Cost:
o Multiple buses require more hardware components, which can increase the
cost of the system. Additionally, managing the data transfer between
multiple buses adds overhead in terms of control and hardware.
3. Signal Interference:
Hardwired Control
Hardwired Control is one of the two primary methods used to control the operation of
a computer's central processing unit (CPU), the other being microprogrammed
control. In a hardwired control system, the control signals required to execute
instructions are generated using fixed logic circuits, typically combinational logic such as
AND gates, OR gates, flip-flops, and decoders.
o The control signals for instruction execution are determined by the design
of the hardware. These signals are generated using specific logic gates
and flip-flops connected in a predefined way.
o The control unit uses combinational logic circuits that directly implement
the control functions required to execute various instructions.
2. Control Signals:
o The control unit generates signals that control various parts of the CPU
(such as the ALU, registers, memory, and I/O devices).
3. Instruction Decoding:
o Based on the decoded opcode, the control unit generates the appropriate
control signals to carry out the instruction's operation.
4. Speed:
o The logic for instruction execution is fixed and does not require the
interpretation of a control memory, making execution faster.
5. Complexity:
o The Program Counter (PC) provides the address of the next instruction.
The instruction is fetched from memory and placed in the Instruction
Register (IR).
o The opcode and any operand addresses are extracted from the instruction.
o The opcode from the instruction is decoded by the control unit's decoder
or logic circuit.
o The control unit identifies the operation to be performed (e.g., ADD, SUB,
MOV) and generates the necessary control signals.
1. Speed:
o Hardwired control systems are typically faster because they use direct
logic circuits that immediately generate control signals, without needing to
interpret or fetch microinstructions as in microprogrammed control.
3. Predictability:
o Since the control signals are generated by fixed logic, the behavior of the
CPU is deterministic and predictable.
1. Limited Flexibility:
3. Design Challenges:
o Designing and troubleshooting the complex logic for a large instruction set
can be difficult. As the number of control signals and interactions grows,
ensuring the correctness and efficiency of the logic circuits becomes more
challenging.
Consider a simple instruction like ADD R1, R2, R3 (Add contents of R2 and R3 and store
the result in R1). The hardwired control unit would perform the following steps:
2. Decode: The control unit decodes the opcode ADD and recognizes that it needs
to perform an addition operation.
4. Execute: The ALU performs the addition of the values in R2 and R3 and stores
the result in R1.
1. Multiprogramming:
3. Context Switching:
4. Memory Management:
5. Process Control:
1. Program Loading:
o Multiple programs are loaded into the computer's main memory (RAM),
usually by the operating system's loader.
o Programs may be in different segments of memory, and the operating
system manages their allocation.
2. CPU Scheduling:
o A time slice or quantum may be assigned to each process, and when the
time slice expires, the CPU is given to the next process in the queue.
3. Context Switching:
o When the CPU switches from one process to another, the context switch
happens. The current process's state (registers, program counter, etc.) is
saved, and the state of the next process is loaded.
o Once the I/O operation finishes, the process is moved to the ready
queue, where it is ready to execute again.
5. Memory Management:
6. Process Termination:
Advantages of Multiprogramming:
2. Increased Throughput:
o By managing multiple processes at the same time, the system can handle
more tasks in a given period, which increases the overall system
throughput.
3. Improved System Responsiveness:
o The system's resources, such as memory and I/O devices, can be utilized
optimally since the operating system can switch between tasks based on
their needs (e.g., CPU-bound vs. I/O-bound tasks).
Disadvantages of Multiprogramming:
1. Complexity:
2. Overhead:
3. Deadlock:
4. Resource Contention:
1. Microinstructions:
2. Control Memory:
3. Microprogram:
4. Control Word:
5. Sequencing:
1. Instruction Fetch:
o The control unit identifies the opcode of the instruction and uses it to look
up the corresponding microprogram from control memory.
2. Microinstruction Execution:
o Once the control unit has the microprogram, it starts executing the
microinstructions stored in control memory.
3. Microprogram Sequencing:
o The sequencer fetches the next microinstruction based on the current step
in the instruction cycle. This could be a sequential fetch (incrementing the
MPC) or a jump (in case of branches or jumps in the microprogram).
▪ I/O operations
5. Instruction Execution:
o The CPU executes the machine-level instruction by completing all the steps
defined by the microprogram. The microprogram ensures that the correct
control signals are generated at each step, guiding the CPU through the
entire instruction cycle (fetch, decode, execute, etc.).
1. Flexibility:
3. Simplicity:
o The control unit’s logic is simplified because it does not require complex
logic circuits for each instruction. Instead, the logic is replaced with
microinstructions that can be accessed from control memory.
1. Speed:
Consider the instruction ADD R1, R2, R3, which adds the contents of registers R2 and R3
and stores the result in R1.
1. Fetch the Instruction: Load the instruction ADD R1, R2, R3 into the
instruction register.
The addition and subtraction of signed numbers are fundamental operations in arithmetic
and computing. In most computers, signed numbers are represented using methods like
two's complement or sign-magnitude representation. Let's explore how signed
numbers are added and subtracted in these common representations.
In two's complement, positive numbers are represented in the usual binary form, and
negative numbers are represented by inverting all bits of the positive number (known as
the one's complement) and then adding 1 to the result.
The addition of two signed numbers in two's complement can be done using the standard
binary addition method. Here’s how it works:
1. Align the Numbers: Ensure both numbers are of the same bit width (e.g., 8
bits, 16 bits, etc.).
2. Add the Numbers: Add the two numbers as you would unsigned numbers,
including the carry bit.
3. Check for Overflow: If there is a carry out of the most significant bit (MSB), the
result is invalid (overflow), as it goes beyond the representable range of
numbers.
To subtract two signed numbers in two's complement, we use the following steps:
2. Add the Numbers: Add the first number to the two's complement of the second
number (which represents its negation).
3. Check for Overflow: Again, ensure that the result fits within the allowable range
of values for the given bit width.
2. Sign-Magnitude Representation
In sign-magnitude representation, the most significant bit (MSB) indicates the sign of
the number. A 000 in the MSB means a positive number, and a 111 means a negative
number. The remaining bits represent the magnitude of the number.
3. One Positive and One Negative: Subtract the smaller magnitude from the
larger magnitude and assign the sign of the larger number to the result.
3. Since the signs are different, subtract the smaller magnitude (3) from the larger
magnitude (5):
5−3=25 - 3 = 25−3=2
3. One Positive and One Negative: Add the magnitudes and assign the sign of
the larger number to the result.
• Overflow occurs when the result of an operation exceeds the range that can be
represented with the given number of bits.
The design of fast adders is crucial for improving the performance of arithmetic
operations in digital circuits, especially for processors and other computational hardware.
Adders are essential components in almost every digital system, used to add two
numbers. Fast adders are designed to minimize delay and increase the speed of addition
operations. Below are some of the commonly used designs for fast adders:
The Ripple Carry Adder is the simplest type of adder, where each bit is added
sequentially from the least significant bit (LSB) to the most significant bit (MSB). The
carry bit from one stage is passed to the next stage, hence the term "ripple."
Structure:
• Each full adder consists of an XOR gate for sum, an AND gate for carry, and an
OR gate to combine carries.
• The carry bit "ripples" through each stage from right to left.
Advantages:
• Simple design.
• Easy to implement.
Disadvantages:
• Slow due to the carry propagation. For an nnn-bit adder, the time complexity is
O(n)O(n)O(n), meaning the carry from the LSB must propagate to the MSB.
Structure:
• For each bit iii, the Generate and Propagate signals are calculated:
• The carry for each bit position can be calculated quickly using the generate and
propagate signals, allowing the carries to be calculated in parallel.
This look-ahead process allows for the carry to be determined in fewer stages, typically
in O(logn)O(\log n)O(logn) time.
Advantages:
• Faster than RCA for large bit-widths due to reduced carry propagation time.
Disadvantages:
• More complex logic and requires additional hardware for generating carry-
lookahead signals.
Structure:
• The adder is divided into smaller blocks. Each block computes two possible sums,
one for carry-in 0 and one for carry-in 1.
• The result is selected based on the actual carry-in for that block.
Advantages:
• Faster than a ripple carry adder because each block computes multiple possible
sums simultaneously.
Disadvantages:
• Requires extra hardware to compute the two possible sums and select the correct
one.
• More hardware area and power consumption than a ripple carry adder.
The Carry Skip Adder improves on the carry propagation by skipping over groups of
bits that do not generate a carry. If a block of bits does not generate a carry, the carry
can "skip" to the next group of bits.
Structure:
• Each block has an additional skip control that determines if the carry should be
propagated or skipped.
Advantages:
• Faster than a ripple carry adder, especially when the probability of generating a
carry in blocks is low.
Disadvantages:
Structure:
• The Kogge-Stone adder generates carry in parallel using a series of stages. Each
stage computes the carry for a group of bits, and the carries are propagated in
parallel through these stages.
• The adder uses a prefix sum approach to compute the carry bits in logarithmic
time.
Advantages:
• Very fast (time complexity is O(logn)O(\log n)O(logn)) and is one of the fastest
adder designs.
Disadvantages:
• Extremely complex and requires a large number of logic gates, leading to
increased area and power consumption.
6. Brent-Kung Adder
The Brent-Kung Adder is another type of parallel prefix adder that is optimized for
lower hardware complexity while maintaining relatively fast performance.
Structure:
• Similar to the Kogge-Stone adder, the Brent-Kung adder uses a tree structure to
calculate the carries, but with fewer stages. This results in reduced hardware
complexity.
Advantages:
• Less hardware overhead compared to the Kogge-Stone adder while still achieving
logarithmic time complexity for carry propagation.
• Suitable for applications that require a balance between speed and area.
Disadvantages:
• More complex than the simpler adders like ripple carry or carry select.
Ripple Carry
O(n)O(n)O(n) Low Slow Low Small
Adder (RCA)
Carry Look-
O(log n)O(\log
Ahead Adder High Fast High Large
n)O(logn)
(CLA)
Multiping positive numbers is one of the fundamental operations in arithmetic and digital
circuits. In binary, multiplication of two positive numbers involves a series of bitwise
operations, similar to the manual multiplication method but adapted for the binary
system. Below are the methods used for multiplication of positive numbers in both
decimal and binary representations, along with the details of multiplication in digital
circuits.
In decimal, multiplying two positive numbers is done using the standard long
multiplication method. Here’s an overview of the steps:
1. Multiply each digit of the first number by each digit of the second number.
Steps:
lua
Copy code
23
x 17
-----
-----
391
2. Step 2: Multiply the first number by each bit of the second number (from right to
left), shifting accordingly:
1012101_21012
• 1010_2
1111_2 ]
Booth’s Algorithm is a more efficient method for multiplying signed binary numbers, but
it can also be used for multiplying positive numbers. It reduces the number of operations
by combining partial products.
Steps:
2. Use the pattern of bits in the multiplier to generate partial products and shift
them accordingly.
3. Add or subtract the appropriate partial products based on the current bit of the
multiplier.
Booth’s Algorithm is especially useful for multiplying large binary numbers efficiently.
In digital circuits, multiplication is typically achieved using logic gates, where the
operations of shifting and adding are done using hardware components. There are
several methods to implement binary multiplication in hardware:
Shift-and-Add Multiplier
• Shifting the multiplicand (the number being multiplied) based on the current bit
of the multiplier.
• Adding the shifted multiplicand to the result if the corresponding multiplier bit is
1.
Example:
Let’s multiply A=5A = 5A=5 and B=3B = 3B=3, where A=1012A = 101_2A=1012 and
B=112B = 11_2B=112:
• Multiply AAA by the rightmost bit of BBB (which is 1), giving 1012101_21012.
• Shift AAA left by 1 bit and multiply by the next bit of BBB (which is 1), giving
101021010_210102.
Array Multiplier
An array multiplier is a grid-based approach that uses multiple AND gates to calculate
the partial products. Each partial product is then summed using full adders. The array
multiplier is efficient but can require a large number of gates for larger numbers.
Carry-Save Multiplier
5. Speeding Up Multiplication
• Wallace Tree: The Wallace tree multiplier reduces the number of stages needed
to add the partial products, improving the overall speed of multiplication.
Signed-operand Multiplication
For signed binary numbers, multiplication involves managing the sign of the result in
addition to the magnitude. The process can be broken down into several key steps,
particularly in the context of binary arithmetic.
In digital systems, signed numbers are typically represented using Two’s complement
or Sign and Magnitude representations. However, Two’s complement is the most
commonly used representation because it simplifies arithmetic operations.
Two's Complement Representation:
For example:
1. Ignore the Sign: First, perform the multiplication as if both numbers were
positive, just like regular binary multiplication.
2. Determine the Sign of the Result: After computing the result, determine the
sign of the product based on the following rules:
o If both operands are positive or both operands are negative, the result is
positive.
o If one operand is positive and the other is negative, the result is negative.
3. Apply the Sign: If the product is negative, convert the result to Two’s
complement (if needed) to represent the negative number.
1. Initialization:
o Extend both MMM and QQQ to have enough bits (usually double the
number of bits of the operands).
2. Determine the Action: Examine the pair of bits formed by Q0Q_0Q0 (the least
significant bit of the multiplier) and Q−1Q_{-1}Q−1 (the extra bit):
3. Shift: After each step, shift the entire pair of registers (accumulator and
multiplier) one bit to the right.
4. Repeat: Continue this process for the total number of bits (usually the number of
bits in the multiplier).
1. Represent numbers:
o M=5=01012M = 5 = 0101_2M=5=01012
2. Initialization:
o Q−1=0Q_{-1} = 0Q−1=0
At the end, Booth’s algorithm yields the correct product, which is −15-15−15 (in 8-bit
Two’s complement: 11110001_2).
When performing multiplication, sign extension may be necessary to ensure that the
result fits within the required number of bits. This ensures that the sign of the number is
correctly handled during the arithmetic process, particularly when the operands involve
both positive and negative numbers.
For example, if the numbers are represented in 4 bits and the product requires more bits
to represent the result, sign extension ensures that the most significant bit (the sign bit)
is correctly propagated into the extended bits.
• Shift registers for shifting the operands during the multiplication process.
• Adders for adding or subtracting the multiplicand and accumulator based on the
sign of the multiplier.
• Two’s complement arithmetic for handling the sign of operands and the result.
Multiplication of signed numbers in hardware can be done efficiently using algorithms like
Booth's Algorithm or by using array multipliers with sign extension.
Fast Multiplication
1. Shift-and-Add Multiplication
This is the basic method of binary multiplication, but it can be made faster through
optimization. The process works by shifting the multiplicand and adding it based on the
bits of the multiplier.
• Basic Idea: Multiply the multiplicand by each bit of the multiplier, shifting the
multiplicand to the left by one position for each higher bit of the multiplier.
2. Booth’s Algorithm
Booth’s algorithm is a fast multiplication method for signed binary numbers. It reduces
the number of additions and shifts needed by considering pairs of bits at a time. This
algorithm can be faster than the traditional shift-and-add method, especially when there
are a lot of consecutive 1s in the binary representation of the multiplier.
• Pairwise Checking: The algorithm processes pairs of bits from the multiplier
and applies different operations (add, subtract, or no operation) based on the
current pair.
• Shift: The result is shifted after each step, and the operation continues for the
length of the multiplier.
Booth’s algorithm is efficient for multiplying signed numbers as it minimizes the number
of partial products needed.
3. Karatsuba Multiplication
Karatsuba multiplication is a divide-and-conquer algorithm that reduces the complexity
of multiplying large numbers. It breaks down the multiplication of two large numbers
into smaller multiplications and additions.
The key advantage of Karatsuba’s algorithm is that it reduces the time complexity from
O(n2)O(n^2)O(n2) to O(nlog23)≈O(n1.585)O(n^{\log_2 3}) \approx
O(n^{1.585})O(nlog23)≈O(n1.585), making it much faster for large numbers.
4. Toom-Cook Multiplication
Toom-Cook is an extension of Karatsuba’s algorithm that splits the numbers into more
parts (three or more). It is particularly effective when dealing with large numbers and
provides even better asymptotic complexity than Karatsuba’s algorithm.
Toom-Cook is based on the idea of evaluating the numbers at various points and
interpolating the result. The more parts the numbers are split into, the more efficient
the algorithm becomes for very large numbers.
5. Schönhage-Strassen Algorithm
FFT-based multiplication methods use the Fast Fourier Transform (FFT) to efficiently
multiply large numbers. This approach relies on the fact that large number multiplication
can be viewed as the multiplication of polynomials.
4. Inverse FFT: Apply the inverse FFT to obtain the final result.
7. Array Multipliers
In digital circuits, array multipliers are a class of multipliers where the partial products
are generated in parallel and then summed together. An array multiplier consists of a
grid of AND gates (for generating the partial products) and adders (for summing them).
• Basic Array Multiplier: The simplest form of an array multiplier, which can
become slow as the number of bits increases.
• Reduced Area Multiplier: Optimized to reduce the area of the multiplier circuit,
sometimes at the cost of speed.
A Wallace tree multiplier is a fast hardware implementation that reduces the number
of stages needed to add the partial products. It uses full adders and half adders in a
tree structure to sum the partial products more quickly than in a simple array multiplier.
The Wallace tree multiplier is particularly effective in digital systems like digital signal
processing and FPGA designs, where speed is crucial.
9. Divide-and-Conquer Algorithms
Integer Division
integer division refers to the process of dividing one integer by another and obtaining
both the quotient and the remainder. The hardware or algorithm used for integer
division performs this operation based on a specific representation (e.g., signed or
unsigned integers) and handles the quotient and remainder accordingly. Integer division
is a crucial operation in arithmetic logic units (ALUs) and in implementing algorithms like
modular arithmetic, division for fixed-point numbers, and addressing.
At a high level, division is often done by repeatedly subtracting the divisor from the
dividend, but this process can be inefficient. More efficient methods, typically used in
hardware or optimized algorithms, work as follows:
Where:
• RRR is the remainder, with the condition 0≤R<∣B∣0 \leq R < |B|0≤R<∣B∣.
In binary division, this operation is more complex due to the binary nature of numbers.
In unsigned division, both the dividend and divisor are considered as non-negative
integers, and the result is simply the quotient and remainder.
• The quotient is obtained by performing the division, and the remainder is what’s
left after the division.
• Division operations for unsigned integers are relatively straightforward and can be
implemented via repeated subtraction, bit shifts, or using hardware divide
instructions.
For signed integers, division involves more considerations since both the dividend and
divisor can be positive or negative. The steps involved include:
• Handling signs: Convert both the dividend and divisor to positive numbers,
perform the division, and then apply the sign to the quotient based on the signs
of the original numbers.
• Remainder sign: The remainder is usually taken to have the same sign as the
dividend (following specific rounding rules).
3. Division Algorithms
Several division algorithms can be used for both signed and unsigned integer division.
Some of the main algorithms include:
In this method, the division process works by subtracting the divisor from the dividend
repeatedly. If the subtraction results in a negative value, the remainder is "restored,"
and the quotient is adjusted.
• Steps:
3. If the result is negative, restore the original dividend and set the quotient
bit to 0; otherwise, set the quotient bit to 1.
This is a more efficient algorithm that eliminates the need for restoring the original
dividend after a subtraction.
• Steps:
This is an optimized method of division that uses an approximation of the quotient for
faster processing. It allows for division operations to be done in fewer steps than
restoring division algorithms.
Modern processors often implement efficient division using hardware support. Many CPUs
include a divider unit that performs division using optimized hardware algorithms, such
as parallel division or reciprocal approximation methods, to achieve division with
reduced latency.
4. Division in Hardware
In a computer's Arithmetic Logic Unit (ALU) or a Floating-Point Unit (FPU), the
integer division process involves several steps:
• Shift and Subtract Approach: This method involves shifting the dividend and
subtracting the divisor. The quotient is incremented step-by-step as the division
proceeds.
• Binary Shifters: Shift registers can perform the division by repeated shifts and
subtractions.
A special case in integer division occurs when the divisor is a power of two. In this case,
division can be accomplished efficiently by bit shifting:
This operation is extremely fast and is often used in algorithms that involve powers of
two.
• Overflow: If the result of the division is too large for the target type, an overflow
condition may occur. This is particularly relevant for signed integer division.
Since division is generally slower than addition, subtraction, and multiplication, there are
various optimizations:
Floating-point numbers are used to represent real numbers in computers, allowing for
the approximation of a vast range of values, including very large and very small
numbers. They are essential in scientific computations, graphics, simulations, and many
other fields. Floating-point arithmetic is based on scientific notation, where numbers are
represented as a mantissa (or significand) multiplied by a power of 2 (in binary
systems), rather than a fixed point.
1. Floating-Point Representation
The IEEE 754 standard specifies how floating-point numbers are encoded into binary. It
divides a floating-point number into three components:
2. Exponent (E): A biased exponent value, which determines the magnitude of the
number.
3. Mantissa (M): Also called the significand, it represents the precision of the
number.
• E is the exponent, with a bias to allow for both positive and negative exponents.
• Bias: For single precision, the bias is 127; for double precision, the bias is 1023.
• S=0
• Mantissa = 1.1011...
2. Floating-Point Operations
1. Align the exponents: If the exponents of the two operands are different, the
mantissa of the smaller exponent is shifted to the right until both exponents are
the same. This may involve rounding and truncation.
2. Add or subtract the mantissas: Once the exponents are aligned, the mantissas
can be added or subtracted. If subtracting, care must be taken for cases where
the result could lead to underflow or loss of precision.
3. Normalize the result: The result may need to be normalized by adjusting the
exponent and mantissa such that the mantissa lies within the valid range (1 ≤ M
< 2 for normalized numbers).
4. Round the result: Finally, rounding may occur to fit the result back into the
fixed precision of the system.
The exponents need to be aligned first (shifting the second number by 2 places), and
then the mantissas are added.
b) Multiplication
Floating-point multiplication is simpler than addition because the exponents are added
rather than aligned. The steps are:
2. Add the exponents: The exponents of the operands are added together.
3. Normalize the result: As with addition, the result must be normalized to ensure
the mantissa is within the valid range.
Here, the mantissas 1.231.231.23 and 5.675.675.67 are multiplied, and the exponents
444 and 222 are added.
c) Division
2. Subtract the exponents: The exponent of the divisor is subtracted from the
exponent of the dividend.
3. Normalize the result: Just like in multiplication and addition, the result needs to
be normalized.
4. Round the result: The final result is rounded to fit the precision.
The mantissas 1.231.231.23 and 5.675.675.67 are divided, and the exponents 444 and
222 are subtracted.
Several special cases are defined by the IEEE 754 standard to handle specific situations:
a) Zero
A floating-point number can represent positive zero and negative zero using the sign
bit. These are used to handle cases of underflow or very small numbers that are
approximated as zero.
b) Infinity
Infinity occurs when a number exceeds the largest representable number in the system.
This happens in operations like division by zero or overflow.
Floating-point numbers have limited precision because they are stored in a fixed number
of bits. This can lead to rounding errors, where the result of an operation cannot be
represented exactly. Common rounding methods include: