0% found this document useful (0 votes)

8 views31 pages

CH03 Data II

This document covers the representation of fractional numbers in binary, focusing on the IEEE floating point standard and its properties. It discusses the limitations of precision in both fractional decimal and binary numbers, as well as the computation over floating point numbers. Additionally, it explains normalized, denormalized, and special values in floating point representation, along with the distribution of encoded values and rounding issues.

Uploaded by

zhwzhw1115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views31 pages

CH03 Data II

Uploaded by

zhwzhw1115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

CH03

Data Representation II
COMP1411: Introduction to Computer Systems

Dr. Mingsong LYU ( 呂鳴松 )

Department of Computing,
The Hong Kong Polytechnic University
Spring 2025

Acknowledgement: These slides are based on the textbook (Computer Systems: A Programmer’s Perspective) and its slides.
These slides are only intended to use internally. Do not publish it anywhere without permission.
Overview
• Representing fractional numbers with binary
• IEEE floating point standard
• Examples and properties
• Computation over floating point numbers
Rethinking fractional decimal numbers
425.367

4 * 102 2 * 101 5 * 100 3 * 10-1 6 * 10-2 7 * 10-3

• Limitation on precision
• Given fixed number of digits in the fractional
part, precision is fixed. In the above Option Max Precision
example, 3 digits in the fractional part
means we can only represent a unit as small X.XXXXX 9.99999 0.00001
as 0.001 --- precision limitation XX.XXXX 99.9999 0.0001
• If we want arbitrary precision, we need an XXX.XXX 999.999 0.001
infinite number of digits
XXXX.XX 9999.99 0.01
• Limitation of representations XXXXX.X 99999.9 0.1
• Given 6 digits in total to represent a
(fractional) number 
Fractional binary numbers
• Fractional binary numbers
1011.1012

1 * 23 1 * 21 1 * 20 1 * 2-1 0 * 2-2 1 * 2-3

Binary number Decimal value

0.001 0.125
0.010 0.25
0.011 0.375
0.100 0.5
0.101 0.625
0.110 0.75
0.111 0.875

Precision limitation: 2-3 (1/8)

Fractional binary numbers
2i
2i-1

4 = 22
••• 2 = 21
1 = 20

bi-
bi ••• b2 -1b1 b0 b-1 b-2 b-3 ••• b-j
1 2 = 1/2
2-2 = 1/4 •••
2-3 = 1/8

2-j

Precision limit: 1 / 2j
Representable numbers
• Limitation #1
• Can only exactly represent numbers of the form x/2k (k: fractional digits)
• Values that cannot be exactly represented
• 1/3 0.0101010101[01]…2
• 1/5 0.001100110011[0011]…2
• 1/10 0.0001100110011[0011]…2

• Limitation #2
• Given a fixed number of bits, limited range of numbers
• More integer bits, or more fraction bits?
• Larger value, or better precision?
• We are running out of bits …

Point here? Point here?

More precise fraction Larger value
Overview
• Get familiar with fractional binary numbers
• IEEE floating point standard
• Examples and properties
• Computation over floating point numbers
IEEE Floating Point
• IEEE Standard 754
• Many machines were having different floating point
encodings, hard to exchange information
• In 1985, IEEE established a uniform standard for floating point
arithmetic
• Nowadays, supported by all major CPUs

• Driven by numerical concerns

• Nice standards for rounding, overflow, underflow
• Hard to make fast in hardware
• Numerical analysts predominated over hardware designers
in defining standard
Floating Point Representation

s exp frac

Total available binary digits to represent a number

v = (–1)s M 2E
• Any binary fractional number can be written into the following
numerical form
• Sign bit s determines whether number is negative or positive
• Significand M is normally a fractional value in range [1.0,2.0).
• Exponent E weights value by power of two
• Decimal value v
Normalized values - Definition
s exp frac v = (–1)s M 2E

• Normalized values: when exp ≠ 000…0 and exp ≠ 111…1

• E = exp – Bias
• exp: B2U value read from the exp field
• Bias = 2k-1 -1, where k is the number of bits for exp
• E.g., k = 3, bias = 3, exp range [1, 6], E range [-2, 3]

• M = 1.xxx…x2 (with implied leading 1)

• xxx…x: bits from the frac field
• Minimum when frac = 000…0 (M = 1.0)
• Maximum when frac = 111…1 (M = 1.111…1)
Normalized values - Example
• Compute the decimal value of this 32-bit floating point number
• 0100 0110 0110 1101 1011 0100 0000 0000

0 10001100 11011011011010000000000

• k=8 v = (–1)s M 2E
• S=0
• Significand
frac = 110110110110100000000002
M = 1.11011011011012
• Exponent
V = (-1)0 * 1.11011011011012 * 213
Exp = 100011002 = 140
= 111011011011012
Bias = 27 – 1 = 127
= 1521310
E = Exp – Bias = 13
Denormalized values - Definition
s 0000000…0 frac v = (–1)s M 2E

• When exp = 000…0

• E = 1 – Bias (NOT 0 – Bias)
• M = 0.xxx…x2 (with implied leading 0)
• xxx…x: bits from the frac field
• Cases
• exp = 000…0, frac = 000..0  ZERO
• Note that s = 0/1, positive/negative zero
• exp = 000…0, frac ≠ 000…0  numbers closest to 0.0
• Designed to represent very “small” numbers
Special values - Definition
s 1111111…1 frac v = (–1)s M 2E

• When exp = 111…1

• Case 1: exp = 111…1, frac = 000…0
• Value  (infinity)
• Positive infinity and negative infinity depending on s
• To represent overflows
• e.g., the result of “x = 1.0/0.0”

• Case 2: exp = 111…1, frac ≠ 000…0

• Not-a-Number (NaN)
• Represents case when no numeric value can be determined
• E.g., the result of sqrt(-1),  - ,  × 0
A miniature floating-point example
An 8-bits floating
point representation
s 4-bit exp 3-bit frac
s exp frac E Value
0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512 closest to zero
Denormalized 0 0000 010 -6 2/8*1/64 = 2/512
numbers …
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512 largest denorm
0 0001 000 -6 8/8*1/64 = 8/512 smallest norm
0 0001 001 -6 9/8*1/64 = 9/512
…
0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below
Normalized
0 0111 000 0 8/8*1 = 1
numbers
0 0111 001 0 9/8*1 = 9/8 closest to 1 above
0 0111 010 0 10/8*1 = 10/8
…
0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240 largest norm
0 1111 000 n/a inf
Special values
0 1111 001 n/a NaN (not a number)
Visualizing floating point encodings
− +
−Normalized −Denorm +Denorm +Normalized

NaN NaN
0 +0

• Only representing discrete numbers in a range

• Limited precision and limited max/min value
• Variable precision: higher as the value is closer to zero
• Smooth transition at the boundary of denorm and normalized
Nice Properties
• Floating point zero same as integer zero
• All bits are 0
• Can (almost) use unsigned integer comparison
• Must first compare sign bits
• Must consider -0 = 0
• NaNs problematic
• Will be greater than any other values
• What should the comparison yield?
• Otherwise, OK
• Denorm vs. normalized
• Normalized vs. infinity
Appreciate the beauty of the econding
An 8-bits floating
point representation
s 4-bit exp 3-bit frac
s exp frac E Value
0 0000 000 -6 0
0 0000 001 -6 1/8*1/64 = 1/512
Denormalized 0 0000 010 -6 2/8*1/64 = 2/512 8 different bit
numbers … vectors
0 0000 110 -6 6/8*1/64 = 6/512
0 0000 111 -6 7/8*1/64 = 7/512
0 0001 000 -6 8/8*1/64 = 8/512
0 0001 001 -6 9/8*1/64 = 9/512
…
0 0110 110 -1 14/8*1/2 = 14/16
0 0110 111 -1 15/8*1/2 = 15/16
Normalized 112 different
0 0111 000 0 8/8*1 = 1
numbers bit vectors
0 0111 001 0 9/8*1 = 9/8
0 0111 010 0 10/8*1 = 10/8
…
0 1110 110 7 14/8*128 = 224
0 1110 111 7 15/8*128 = 240
0 1111 000 n/a inf
Special values
0 1111 001 n/a NaN (not a number)
The distribution of the encoded values

• The IEEE 754 FP encoding covers non-negative values in range [0, 240], with 120 available
bit vectors representing 120 different values (including both integer and fractional values).
• However, the 120 values represented are not evenly distributed in range [0, 240].
• We obtain the distribution of the values and find out that this encoding devotes more bit
vectors to represent small values while less bit vectors to represent large values. For
example, 50+ values are in range [0, 1], while only 1 value falls in range [225, 240].
• On the small values’ side, this encoding can achieve higher precision; while towards the
large values’ side, precision is decreased and on the contrary, larger values are covered.
• The design idea is that people may need higher precision when they deal with small
values; while for large values, the precision seems not as critical.
• This is a trade-off between precision and the breadth of the value range. This encoding is
clever, if the design idea reflects common usage patterns. This is the case in reality.
Decimal value  IEEE FP 754 binary

• Steps
• Normalize to have leading 1
s 4-bits exp 3-bits frac
• Round to fit within fraction
• Post-normalize to deal with effects of rounding
• Case study

Value Binary
128 10000000
13 00001101
17 00010001
19 00010011
138 10001010
63 00111111
Normalize

s 4-bits exp 3-bits frac

• Requirement
• Set binary point in the form 1.xxx…x
• Decrement exponent as you shift the point to the left

Value Binary Fraction E

128 10000000 1.0000000 7
13 00001101 1.1010000 3
17 00010001 1.0001000 4
19 00010011 1.0011000 4
138 10001010 1.0001010 7
63 00111111 1.1111100 5

We have only 3 bits for the frac part, so we need to do rounding!

A rounding problem
• Let’s consider rounding the following numbers into integers.

Traditional Yet another Assume we have ten numbers as listed.

Number Rounding rounding The average value of these ten numbers
are 15.
10.5 11 10
We want to round each number to an
11.5 12 12 integer value.
12.5 13 12 The second column lists the results of
13.5 14 14 applying traditional rounding. As a result,
each rounded value increases by 0.5
14.5 15 14 compared to the original number. As a
15.5 16 16 result, the average of all the ten
rounded numbers is 15.5.
16.5 17 16
Let’s apply round to even, the results of
17.5 18 18 which is listed in the third column.
Compared to the original values, some
18.5 19 18
rounded values become larger, some
19.5 20 20 rounded values become smaller. As a
result, the average of all the ten
15 15.5 15
rounded numbers is 15.
The traditional rounding scheme
Rounding
s 4-bits exp 3-bits frac

1.BBGRXXX
Guard bit: LSB of result Round bit: 1st bit removed Sticky bit: OR of remaining bits
• Round up conditions
• Round = 1, sticky = 1  > 0.5
• Round = 1, sticky = 0  round to even, to make G an even number

Value Fraction GRS Incr? Rounded

128 1.0000000 000 NO 1.000
13 1.1010000 100 NO 1.101
17 1.0001000 010 NO 1.000
19 1.0011000 110 YES 1.010
138 1.0001010 011 YES 1.001
63 1.1111100 111 YES 10.000
Rounding
• The above rounding scheme can be further explained into the
following cases
• Assume the frac part has three bits 1.BBGRXXX
• Case 1: R = 0, no carry in
• E.g., 1.1010110  1.101, 1.0000101  1.000
• Case 2: R = 1, X = at least one of the sticky bits is 1, do carry in
• E.g., 1.0001001  1.001, 1.0011010  1.010
• Case 3: R = 1, X = all of the sticky bits are 0, depends
• Case 3.1: G = 1, do carry in
• E.g., 1.0011000  1.010, 1.1011000  1.110
• Case 3.2: G = 0, no carry in
• E.g., 1.1001000  1.100, 1.0001000  1.000
Post-normalize
• Issue
• Rounding may have caused overflow
• Handle by shifting right once & incrementing exponent

Value Fraction Rounded exp Adjust Result

128 1.0000000 1.000 7 128
13 1.1010000 1.101 3 13
17 1.0001000 1.000 4 16
19 1.0011000 1.010 4 20
138 1.0001010 1.001 7 144
63 1.1111100 10.000 5 1.000/6 64
Indication of Rounding
• The case for value 138
Value Fraction Rounded exp Result
138 1.0001010 1.001 7 144

• The bit vector to represent 138 is 0 1110 001

• In fact, 0 1110 001 represents value 144, instead of 138
• Looking at adjacent bit vectors
• 0 1110 000  128
• 0 1110 001  144
• 0 1110 010  160
• 138 cannot be precisely represented by this encoding, and it has
been rounded to the closes value the encoding can represent
Overview
• Background: fractional binary numbers
• IEEE floating point standard: definition
• Examples and properties
• Computation over floating point numbers
Floating point addition
• (–1)s1 M1 2E1 + (-1)s2 M2 2E2
•Assume E1 > E2 Get binary points lined up
E1–E2
• Exact Result: (–1)s M 2E (–1)s1 M1
•Sign s, significand M:
• Result of signed align & add + (–1)s2 M2

•Exponent E: E2
(–1)s M

• Fixing
•If M ≥ 2, shift M right, increment E
•if M < 1, shift M left k positions, decrement E by k
•Overflow if E out of range, computer will represent the result by ∞ or - ∞
•Round M to fit frac precision
Floating point multiplication
• (–1)s1 M1 2E1 x (–1)s2 M2 2E2

• Exact Result: (–1)s M 2E

s1 exp1 frac1
• Sign s: s1 ^ s2
• Significand M: M1 x M2
s2 exp2 frac2
• Exponent E: E1 + E2

• Fixing
• Round M to fit frac precision
• If M ≥ 2, shift M right, increment E
•Overflow if E out of range, computer will represent the result by ∞ or - ∞

• Implementation
• The biggest task is multiplying significands, taking a lot of time
Mathematical properties of FP Multiplication
• For the following questions, YES or NO?
• Multiplication Commutative (a*b = b*a)?
• Multiplication Associative ((a*b)*c = a*(b*c))?
• Possibility of overflow, inexactness of rounding
• Eg: (1020 * 1020)*10-20= inf,
1020 * (1020 * 10-20)= 1020
• 1 * a = a?
• Multiplication distributes over addition (a*(b+c) = a*b + a*c)?
• Possibility of overflow, inexactness of rounding
1020 *(1020 - 1020)= 0.0,
1020 * 1020 – 1020 * 1020 = inf – inf  NaN
Mathematical properties of FP addition
• For the following questions, YES or NO?
• Commutative (a+b = b+a)?
• Associative ((a+b)+c = a+(b+c))??
• Overflow and rounding
• (3.14 + 1020) – 1020 = 0.0
• 3.14 + (1020 – 1020) = 3.14
• 0 is additive identity?
• Every element (x) has an additive inverse (exists x’, so that
x+x’=0)?
− +
−Normalized −Denorm +Denorm +Normalized

NaN NaN
0 +0
Thank You

Fixed Point and Floating Point Number Representations
No ratings yet
Fixed Point and Floating Point Number Representations
7 pages
Bits and Bytes PDF
No ratings yet
Bits and Bytes PDF
76 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
Mark Vies Safety Controller: Block Library
100% (1)
Mark Vies Safety Controller: Block Library
92 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
Fixed and Floating Point Representation
No ratings yet
Fixed and Floating Point Representation
5 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
101 Onwards On Python Pandas and Pyplot
No ratings yet
101 Onwards On Python Pandas and Pyplot
33 pages
Unit 2
No ratings yet
Unit 2
16 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
RX Family: User's Manual: Software Renesas 32-Bit Mcu RX Family
No ratings yet
RX Family: User's Manual: Software Renesas 32-Bit Mcu RX Family
278 pages
Appliness #6 - September 2012
No ratings yet
Appliness #6 - September 2012
144 pages
HTML Practice Exercises With Solutions
No ratings yet
HTML Practice Exercises With Solutions
10 pages
Coa Unit3
No ratings yet
Coa Unit3
116 pages
Pg060 Floating Point
No ratings yet
Pg060 Floating Point
105 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Python Pandas For Class XI Tutorial 1
No ratings yet
Python Pandas For Class XI Tutorial 1
8 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
Lecture 4
No ratings yet
Lecture 4
154 pages
Computer Programming DR - Zaidoon - PDF Part 2
No ratings yet
Computer Programming DR - Zaidoon - PDF Part 2
104 pages
Insp v1.0r4 CTC v6.2r2 User Guide
No ratings yet
Insp v1.0r4 CTC v6.2r2 User Guide
50 pages
Chapter 3 Arithmetic For Computers
No ratings yet
Chapter 3 Arithmetic For Computers
82 pages
JS Interview #35
No ratings yet
JS Interview #35
74 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
IEEE 754 Floating Point Notes
No ratings yet
IEEE 754 Floating Point Notes
4 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
EN WirelessHART Gateway 58
No ratings yet
EN WirelessHART Gateway 58
57 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
Lecture11 Slides 1
No ratings yet
Lecture11 Slides 1
52 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Floating Point: CS230 System Programming 4
No ratings yet
Floating Point: CS230 System Programming 4
39 pages
04 Float
No ratings yet
04 Float
40 pages
Lecture 4 - Floating Point Data
No ratings yet
Lecture 4 - Floating Point Data
44 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
04 Float 2
No ratings yet
04 Float 2
44 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
Pandas - 1
No ratings yet
Pandas - 1
45 pages
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
No ratings yet
Floating Point: 15-213: Introduction To Computer Systems 4 Lecture, Sep. 10, 2015
40 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
04 Float
No ratings yet
04 Float
40 pages
Java Notes
No ratings yet
Java Notes
32 pages
Floating Point Numbers: CS101 Introduction To Computing
No ratings yet
Floating Point Numbers: CS101 Introduction To Computing
41 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
Lecture 3 - Floating Point
No ratings yet
Lecture 3 - Floating Point
33 pages
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
No ratings yet
Floa NG Point: 15 - 213: Introduc On To Computer Systems 4 Lecture, Sep 5, 2013
40 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
Class03 cs230s22
No ratings yet
Class03 cs230s22
33 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
Floating Point
No ratings yet
Floating Point
33 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Floating-Point Number of Extreme Cases
No ratings yet
Floating-Point Number of Extreme Cases
27 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Python Pandas - II
No ratings yet
Python Pandas - II
28 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
L1 FloatingPointNumbers Intro
No ratings yet
L1 FloatingPointNumbers Intro
17 pages
Computer Architecture and Organization: Lecture 6: Floating Points
No ratings yet
Computer Architecture and Organization: Lecture 6: Floating Points
20 pages
TPSEC04
No ratings yet
TPSEC04
12 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
LI100P0-Q25LM0-IOLX3-H1141 Parametros IO-Link
No ratings yet
LI100P0-Q25LM0-IOLX3-H1141 Parametros IO-Link
15 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
MODBUS Interface: Communication Principles
No ratings yet
MODBUS Interface: Communication Principles
14 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Information Representation Floating Point
No ratings yet
Information Representation Floating Point
17 pages
Recipes For Pre-Training Llms With Mxfp8: Asit Mishra Dusan Stosic Simon Layton
No ratings yet
Recipes For Pre-Training Llms With Mxfp8: Asit Mishra Dusan Stosic Simon Layton
14 pages
Floating Point
No ratings yet
Floating Point
13 pages
3.1 Data Representation: 3.1.3 Real Numebrs and Normalized Floating-Point Representation
No ratings yet
3.1 Data Representation: 3.1.3 Real Numebrs and Normalized Floating-Point Representation
14 pages
8087 Datasheet
No ratings yet
8087 Datasheet
22 pages
Matlab Fundamental 12
No ratings yet
Matlab Fundamental 12
11 pages
ICCV'21 Liu Improving Neural Network Efficiency Via Post-Training Quantization With Adaptive Floating-Point ICCV 2021 Paper
No ratings yet
ICCV'21 Liu Improving Neural Network Efficiency Via Post-Training Quantization With Adaptive Floating-Point ICCV 2021 Paper
10 pages
Number Representation
No ratings yet
Number Representation
7 pages
Data Lab
No ratings yet
Data Lab
6 pages
Lab 7
No ratings yet
Lab 7
9 pages
Floating Point
No ratings yet
Floating Point
6 pages
Floating Point Representation - M.eng Term Paper
No ratings yet
Floating Point Representation - M.eng Term Paper
6 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
C# in A Nutshell - Code Listings - 4
No ratings yet
C# in A Nutshell - Code Listings - 4
4 pages
Infinity, Nullity, Transmathematics
No ratings yet
Infinity, Nullity, Transmathematics
3 pages

CH03 Data II

Uploaded by

CH03 Data II

Uploaded by

CH03

Dr. Mingsong LYU ( 呂鳴松 )

4 * 102 2 * 101 5 * 100 3 * 10-1 6 * 10-2 7 * 10-3

1 * 23 1 * 21 1 * 20 1 * 2-1 0 * 2-2 1 * 2-3

Binary number Decimal value

Precision limitation: 2-3 (1/8)

Point here? Point here?

• Driven by numerical concerns

Total available binary digits to represent a number

• Normalized values: when exp ≠ 000…0 and exp ≠ 111…1

• M = 1.xxx…x2 (with implied leading 1)

• When exp = 000…0

• When exp = 111…1

• Case 2: exp = 111…1, frac ≠ 000…0

• Only representing discrete numbers in a range

s 4-bits exp 3-bits frac

Value Binary Fraction E

We have only 3 bits for the frac part, so we need to do rounding!

Traditional Yet another Assume we have ten numbers as listed.

Value Fraction GRS Incr? Rounded

Value Fraction Rounded exp Adjust Result

• The bit vector to represent 138 is 0 1110 001

• Exact Result: (–1)s M 2E

You might also like