UNIT4 Computer Arithmetic Sent
UNIT4 Computer Arithmetic Sent
Unit 4
Computer Arithmetic
➢There are two representations (0000 0000B and 1000 0000B) for the
number zero, which could lead to inefficiency and confusion.
➢ There are two representations (0000 0000B and 1111 1111B) for zero.
Algorithm
Big Endian and Little Endian
➢Modern computers store one byte of data in each memory address or
location, i.e., byte addressable memory. An 32-bit integer is, therefore,
stored in 4 memory addresses.
➢The term "Endian" refers to the order of storing bytes in computer
memory.
➢In "Big Endian" scheme, the most significant byte is stored first in the
lowest memory address (or big in first).
➢In "Little Endian" stores the least significant bytes in the lowest
memory address.
➢For example, the 32-bit integer 12345678H is stored as
✓12H 34H 56H 78H in big endian
✓78H 56H 34H 12H in little endian
Hardware for Multiplication Algorithm
➢Example:
✓Multiplier = 001110 (+14) has a string of 1's from 2^3 to 2^1 (k =3,m = 1).
The number can be represented as 2^(k+1) – 2^m = 2^4 – 2^1 = 16 - 2 = 14.
➢Two types
✓Restoring Algorithm
✓Non-restoring Algorithm
Non-Restoring Division Algorithm
➢It combines the restore / no restore and shift left steps of two successive
cycles and reduces the number of operations.
➢Note:- B Divisor (n+1 bit), Q Dividend (n bit), A Remainder (n+1
bit)
➢Procedure:
✓Step 1: Do the following n times
i) If the sign of A (An) is 0, shift A and Q left one bit position and subtract B from
A; otherwise, shift A and Q left and add B to A.
ii) Now, if the sign of A is 0, set Q0 to 1; otherwise set Q0 to 0.
➢E = e + 8(bias)
➢If exponent is represented by k bits, then bias is 2(k-1) . Exponent is
represented in Excess-bias form. In the above example it is Excess-8 form.
Normalized Mantissa
➢Only one non‐zero/zero digit (1/0 in case of binary) left to the point. Example:-
➢Binary Number
➢Note: The bias value is added to the true exponent to solve the problem
of representation of negative exponent.
Example
➢Q. 16 bit register store floating point number. The mantissa is
normalized and exponent is represented in excess-32 form. What is 16
bit value for +(13.5)D in register?
➢Solution: +(13.5)D = (1101.1)B.
✓Explicit normalization (default) value = 0.11011 x 24 ,
✓M=11011 (Mantissa)
✓e=4, E=4+32 = 36D= 100100B (Exponent)
✓Excess-32 →bias=32→ 2k-1 =32 →k=6
✓ 16 bit representation: 1bit(Sign), 6 bit (Exponent), 9 bit (Mantissa)
✓ +(13.5)D= 0100100110110000
Contd…
➢ Maximum value= (-1)0 x 0.111111111 x 263-32 as 26 -1 = 63
✓11111111 x 2-9 x 231 = + (29-1 )+ 222 = 231
➢Minimum value= -231
➢Smallest possible +ve value= 0 (S), 000000(Exponent), 100000000
(Mantissa)
➢Value= +0.1 x 20-32 = 2-33
➢Disadvantages of conventional representation
✓It can not store zero.
✓It can not represent infinity
✓It can not store or represent a number which is not normalized.
IEEE-754 32-bit Single-Precision Floating-Point Numbers
➢In 32-bit single-precision floating-point representation:
✓ The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for
negative numbers.
✓The following 8 bits represent exponent (E).
✓The remaining 23 bits represents fraction (F).
IEEE-754 32-bit Single-Precision Floating-Point Numbers
(Normalized Form)
➢The sign-bit represents the sign of the number. Fractional part (1.F) are
normalized with an implicit leading 1.
➢De-normalized form is needed to represent zero (with F=0 and E=0). It can
also represents very small positive and negative number close to zero.
Example
➢Q. IEEE-754 32-bit floating-point representation pattern is 1 00000000 000
0000 0000 0000 0000 0001.
✓Sign bit S = 1 ⇒ negative number
✓E = 0 (in de-normalized form)
✓Fraction is 0.000 0000 0000 0000 0000 0001B (with an implicit leading 0) = 1×2^-23
✓The number is -2^-23 × 2^(-126) = -2×(-149) ≈ -1.4×10^-45
Note:- For E = 255, it represents special values, such as ±INF (positive and negative
infinity) and NaN (not a number).
IEEE-754 32-bit Single-Precision Floating-Point Numbers
(De-normalized Form)
➢For E = 0, N = (-1)^S × 0.F × 2^(-126). These numbers are called De-
normalized form. The exponent of 2^-126 evaluates to a very small number.
➢De-normalized form is needed to represent zero (with F=0 and E=0). It can
also represents very small positive and negative number close to zero.
IEEE-754 64-bit Double-Precision Floating-Point Numbers
➢The representation scheme for 64-bit double-precision is similar to the 32-
bit single-precision:
✓ The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for
negative numbers.
✓ The following 11 bits represent exponent (E).
✓ The remaining 52 bits represents fraction (F).
IEEE-754 64-bit Double-Precision Floating-Point
Numbers (Normalized Form)
➢1 ≤ E ≤ 2046 with excess of 1023. The actual exponent is from -1022 to
+1023, and
➢ N = (-1)^S × 1.F × 2^(E-1023)
➢Example 1: Normalized maximum floating point number (N(max))
✓7FEF FFFF FFFF FFFFH
✓N(max) = 1.1...1B × 2^1023 = (2 - 2^-52) × 2^1023
✓(≈1.7976931348623157 × 10^308)
➢Example 2: Normalized minimum floating point number (N(min))
✓0010 0000 0000 0000H
✓N(min) = 1.0B × 2^-1022
✓(≈2.2250738585072014 × 10^-308)
IEEE-754 64-bit Double-Precision Floating-Point
Numbers (De-normalized Form)
➢For E = 0, N = (-1)^S × 0.F × 2^(-1022).
➢Example 1: De-normalized maximum floating point number (D(max))
✓001F FFFF FFFF FFFFH
✓D(max) = 0.1...1 × 2^-1022 = (1-2^-52)×2^-1022
✓(≈4.4501477170144023 × 10^-308)
➢Example 2: De-normalized minimum floating point number (D(min))
✓0000 0000 0000 0001H
✓D(min) = 0.0...1 × 2^-1022 = 1 × 2^-52 × 2^-1022 = 2^-1074
✓(≈4.9 × 10^-324)
Note: For E = 2047, N represents special values, such as ±INF (infinity), NaN (not a
number).
Problem1
➢convert 25.5 to IEEE 754 Single Precision (32-bit) format:
➢Convert to binary: Integer part: 25 → 11001 Fractional part: .5 → .1
Combined binary: 11001.1
➢ Normalize: 11001.1 = 1.10011 × 2⁴ (Shift left 4 position from .(dot))
➢ Exponent = 4 Biased exponent = 127 + 4 = 131 → binary: 10000011
➢Mantissa (fractional part after leading 1.): 10011000000000000000000
(pad to 23 bits)
➢Sign bit: Positive number → 0
➢Ans: 0 10000011 10011000000000000000000
Problem2
➢convert 25.5 to IEEE 754 Double Precision (64-bit) format:
➢Convert to binary: Integer part: 25 → 11001 Fractional part: .5 → .1
Combined binary: 11001.1
➢ Normalize: 11001.1 = 1.10011 × 2⁴ (Shift left 4 position from .(dot))
➢ Exponent = 4 Biased exponent = 1023 + 4 = 1027 → binary: 10000000011
➢Mantissa (fractional part after leading 1.):
1001100000000000000000000000000000000000000000000000 (pad to
52 bits)
➢Sign bit: Positive number → 0
➢Ans: 0 10000000011 1001100000000000000000000000000000000000000000000000
References
1. “Computer Organization”, C. Hamacher, V. Zvonko, S. Zaky, Tata
McGraw Hill Publication, ISBN 007-120411-, 5th Edition.
2. “Computer System Architecture”, M. Morris Mano, Pearson
Education, ISBN-978-81-317-0070-9, 3rd Edition.
3. https://www.geeksforgeeks.org/computer-organization-and-
architecture-tutorials/
4. https://nptel.ac.in/courses/106/103/106103068/
5. https://www3.ntu.edu.sg/home/ehchua/programming/java/datare
presentation.html