Real Number Representations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Chapter 4

Real Number Representations


IEEE754 Floating Point (FP)
A floating-point (FP) representation is used to represent real numbers.
• The floating-point representation is encoded in a finite number of
bits.

• The IEEE (Institute of Electrical and Electronics Engineers)


developed the Floating-point Standard 754 to represent the real
numbers.

• It was developed in 1985 to standardize computation among the


various computer manufactures.

• The IEEE 754 dictates the precision, accuracy, and arithmetic


operations that must be implemented in conforming processors.
IEEE754 Floating Point (FP)
Aside
A fixed- point (FXP) representation is used to represent signed integer
numbers.
• The fixed-point representation is encoded in a finite number of bits.

• The IEEE developed the Fixed-point Standard 754 to represent the


Integer numbers.

S | 2’s Complement integer


n-1 n-2 0

Single Precision FXP Number: 32- bit


Double Precision FXP Number: 64- bit

Return
IEEE754 Binary Floating- Point (BFP)
The representation of the IEEE 754 BFP number consists of three parts:
BFP Number: S | EXPONENT(E) | FRACTION (F)
n-1 0
Consider a IEEE 754 BFP number X, it will represented by 3 fields:

i) Sign Sx : is a sign bit and indicates whether the FP number X is positive or


negative, (Sx =0 : means X is positive, Sx =1 :means X is negative).
ii) Exponent Ex: Exponent Ex is used to adjust the position of the binary point
(as opposed to a "decimal" point).
- The number of bits of the exponent field (Ex) depends on the format used.
- The exponent is a signed integer value that can be represented by biased (B).

The bias B is given by


B = 2ne -1 – 1 (4.1)
Where 𝒏𝐞 : is the number of exponent bits in FP format.
Note: Using the biased B is very important to make all exponents Ex in
the BFP representation, to be positive number.

iii) Magnitude Mx: IEEE754 BFP standard also calls the Magnitude
( Mx) to be a Normalized Significand (or Mantissa)

What is the meaning of Normalized Significand Mx ?


It means that the biased exponent Ex is chosen such that the highest
order bit (Integer bit) in the significand (Mx ) is a 1 (except for zero value).

Thus, the normalized significand is represented by


Mx = 1.F with or 𝟏 ≤ 𝑴𝑿 < 𝟐 (4.2)
where
F: is the fraction of the real number and it consists of (f- bits). The
number of bits of F depends on the format used.
F = 𝑓−1 𝑓−2 𝑓−3 … … . . 𝑓−𝑚
Thus the normalized mantissa is

𝑴𝒙 = 𝟏. 𝑭 = 𝟏. 𝒇−𝟏 𝒇−𝟐 𝒇−𝟑 … … . . 𝒇−𝒎 (4.3)

Note: The most significant 1 (integer bit) is hidden bit (i.e. this integer
bit (1) is not stored in IEEE754 BFP registers)
Normalized Representation of IEEE754 BFP Number
S|EXPONENT| FRACTION
𝑆𝑋 𝐸𝑋 𝑭𝒙
The three fields are packed into one word with the order of fields:
Sx , Ex , and 𝑭𝒙 , such that:

𝑋 = (−1)𝑺𝒙 . (1. 𝐹𝑥 ). 2𝑬𝒙 − 𝑩 Normalized Number (4.4)

let 𝒆𝒙 = 𝑬𝒙 − 𝐵 𝒆𝒙 : the unbiased exponent


𝑬𝑿 : the biased exponent
B : Bias (constant value. It is value depends on
the type of standard IEEE754 format to
represent BFP number.
Normalized Number

𝑋 = (−1)𝑺𝒙 . (1. 𝐹). 2𝒆𝒙 (4.5)


IEEE754 Binary Floating Point (BFP) Format
IEEE754 storage format specifies how a BFP number is stored in a
memory and in the registers of BFP unit.

• The IEEE 754 BFP standard defines two basics formats:


a) Single Precision (32- bit)
b) Double Precision (64- bit)
Extended formats for each of these two basics formats are also used.
Figure (4.1) shows the data formats supported by the IEEE 754.

31 30 …. 23 22 0

43 42 31 30 0
63 62 …. 52 51 ……………………… 0

79 78 …. … .. 63 62 ……………………… 0

Fig.(4.1) Data formats supported by the IEEE 754 FP


standard representation.
IEEE754 BFP Formats
SingleShort
Precision Format:
(32-bit) (32- bit)
format

8 bits, 23 bits for fractional part


bias = 127, (plus hidden 1 in integer part)
–126 to 127

Sign Exponent Significand


11 bits,
bias = 1023, 52 bits for fractional part
–1022 to 1023 (plus hidden 1 in integer part)

LongPrecision
Double (64-bit) Format:
format (64-bit)
Special Values
These are the values that are not representable in BFP system, but are
useful for representing ±∞ and Not a Number (NaN).

NaN: is a special value, it is useful for representing undefined results,


such as (0/0) and the square root of negative number, or when variables
are uninitialized.

Example 1: If the biased exponent of X is:


𝑬𝒙 = 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 (for Single precision format: 𝑬𝒙 (𝟖 − 𝒃𝒊𝒕) )
𝑬𝒙 = 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 (for Double precision format: 𝑬𝒙 (𝟏𝟏 − 𝒃𝒊𝒕) )

Then X is NaN
Special Values

Example 2: Suppose Y is represented in single precision BFP format.

* If all the bits of the biased exponent 𝑬𝒀 are equal 1 and all the fraction
bits (𝑭) are equal 0;

then the number Y will be either −∞ or +∞ depending on the sign 𝑺𝒀 :


Biased Exponent is:
𝑬𝒀 = 𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 (𝑬𝒀 : (𝟖 − 𝒃𝒊𝒕) ) ≡ (𝟐𝟓𝟓)𝟏𝟎
Fraction
𝑭 = 𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟎 … 𝟎𝟎𝟎𝟎00 (𝑭: (𝟐𝟑 − 𝒃𝒊𝒕) )

Then Y : ±∞
Features of IEEE754 BFP Floating-Point Formats

(B)

X
Exceptions
Five types of exceptions are defined in IEEE 754 BFP Standard. By
default, these exceptions set flags and computations continue. The
exceptions are:
1- Overflow (exponent): occurs when the result is too large to be
represented.
2- Underflow ((exponent): occurs when the nonzero magnitude of the
result is too small to be represented.
3- Division by Zero.
4- Inexact: occurs when infinite- precision result different from FP
number.
5- Invalid: set when a NaN result is produced.

You might also like