0% found this document useful (0 votes)

20 views14 pages

Lecture 02 - Floating Point Arithmetic

Uploaded by

alngarm246

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views14 pages

Lecture 02 - Floating Point Arithmetic

Uploaded by

alngarm246

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Digital Engineering

Fall 2024

Lecture 02 - Floating Point Arithmetic

Instructor: Dr. Tarek Abdul Hamid
The World is Not Just Integers
 Programming languages support numbers with fraction
 Called floating-point numbers
 Examples:
3.14159265… (π)
2.71828… (e)
0.000000001 or 1.0 × 10–9 (seconds in a nanosecond)
86,400,000,000,000 or 8.64 × 1013 (nanoseconds in a day)
last number is a large integer that cannot fit in a 32-bit integer
 We use a scientific notation to represent
 Very small numbers (e.g. 1.0 × 10–9)
 Very large numbers (e.g. 8.64 × 1013)
 Scientific notation: ± d . f1f2f3f4 … × 10 ± e1e2e3

2 Dr. Tarek Abdul Hamid Digital Engineering

Floating-Point Numbers
 Examples of floating-point numbers in base 10 …
 5.341×103 , 0.05341×105 , –2.013×10–1 , –201.3×10–3
decimal point
 Examples of floating-point numbers in base 2 …
 1.00101×223 , 0.0100101×225 , –1.101101×2–3 , –1101.101×2–6
 Exponents are kept in decimal for clarity binary
point
 The binary number (1101.101)2 = 23+22+20+2–1+2–3 = 13.625
 Floating-point numbers should be normalized
 Exactly one non-zero digit should appear before the point
 In a decimal number, this digit can be from 1 to 9
 In a binary number, this digit should be 1
 Normalized FP Numbers: 5.341×103 and –1.101101×2–3
 NOT Normalized: 0.05341×105 and –1101.101×2–6

3 Dr. Tarek Abdul Hamid Digital Engineering

Floating-Point Representation
 A floating-point number is represented by the triple
 S is the Sign bit (0 is positive and 1 is negative)
 Representation is called sign and magnitude
 E is the Exponent field (signed)
 Very large numbers have large positive exponents
 Very small close-to-zero numbers have negative exponents
 More bits in exponent field increases range of values
 F is the Fraction field (fraction after binary point)
 More bits in fraction field improves the precision of FP numbers

S Exponent Fraction

Value of a floating-point number = (-1)S × val(F) × 2val(E)

4 Dr. Tarek Abdul Hamid Digital Engineering

IEEE 754 Floating-Point Standard
 Found in virtually every computer invented since 1980
 Simplified porting of floating-point numbers
 Unified the development of floating-point algorithms
 Increased the accuracy of floating-point numbers
 Single Precision Floating Point Numbers (32 bits)
 1-bit sign + 8-bit exponent + 23-bit fraction
S Exponent8 Fraction23

 Double Precision Floating Point Numbers (64 bits)

 1-bit sign + 11-bit exponent + 52-bit fraction
S Exponent11 Fraction52
(continued)
5 Dr. Tarek Abdul Hamid Digital Engineering
Normalized Floating Point
Numbers
 For a normalized floating point number (S, E, F)

S E F = f1 f2 f3 f4 …
 Significand is equal to (1.F)2 = (1.f1f2f3f4…)2
 IEEE 754 assumes hidden 1. (not stored) for normalized numbers
 Significand is 1 bit longer than fraction
 Value of a Normalized Floating Point Number is

(–1)S × (1.F)2 × 2val(E)

(–1)S × (1.f1f2f3f4 …)2 × 2val(E)
(–1)S × (1 + f1×2-1 + f2×2-2 + f3×2-3 + f4×2-4 …)2 × 2val(E)
(–1)S is 1 when S is 0 (positive), and –1 when S is 1 (negative)

6 Dr. Tarek Abdul Hamid Digital Engineering

Biased Exponent Representation
 How to represent a signed exponent? Choices are …
 Sign + magnitude representation for the exponent
 Two’s complement representation
 Biased representation
 IEEE 754 uses biased representation for the exponent
 Value of exponent = val(E) = E – Bias (Bias is a constant)
 Recall that exponent field is 8 bits for single precision
 E can be in the range 0 to 255
 E = 0 and E = 255 are reserved for special use (discussed later)
 E = 1 to 254 are used for normalized floating point numbers
 Bias = 127 (half of 254), val(E) = E – 127
 val(E=1) = –126, val(E=127) = 0, val(E=254) = 127

7 Dr. Tarek Abdul Hamid Digital Engineering

Biased Exponent – Cont’d
 For double precision, exponent field is 11 bits
 E can be in the range 0 to 2047
 E = 0 and E = 2047 are reserved for special use
 E = 1 to 2046 are used for normalized floating point numbers
 Bias = 1023 (half of 2046), val(E) = E – 1023
 val(E=1) = –1022, val(E=1023) = 0, val(E=2046) = 1023
 Value of a Normalized Floating Point Number is

(–1)S × (1.F)2 × 2E – Bias

(–1)S × (1.f1f2f3f4 …)2 × 2E – Bias
(–1)S × (1 + f1×2-1 + f2×2-2 + f3×2-3 + f4×2-4 …)2 × 2E – Bias

8 Dr. Tarek Abdul Hamid Digital Engineering

Examples of Single Precision Float
 What is the decimal value of this Single Precision float?
10111110001000000000000000000000
 Solution:
 Sign = 1 is negative
 Exponent = (01111100)2 = 124, E – bias = 124 – 127 = –3
 Significand = (1.0100 … 0)2 = 1 + 2-2 = 1.25 (1. is implicit)
 Value in decimal = –1.25 × 2–3 = –0.15625
 What is the decimal value of?
01000001001001100000000000000000

 Solution: implicit
 Value in decimal = +(1.01001100 … 0)2 × 2130–127 =
(1.01001100 … 0)2 × 23 = (1010.01100 … 0)2 = 10.375
9 Dr. Tarek Abdul Hamid Digital Engineering
Examples of Double Precision Float
 What is the decimal value of this Double Precision float ?
01000000010100101010000000000000
00000000000000000000000000000000
 Solution:
 Value of exponent = (10000000101)2 – Bias = 1029 – 1023 = 6
 Value of double float = (1.00101010 … 0)2 × 26 (1. is implicit) =
(1001010.10 … 0)2 = 74.5
 What is the decimal value of ?

10111111100010000000000000000000
00000000000000000000000000000000

 Do it yourself! (answer should be –1.5 × 2–7 = –0.01171875)

10 Dr. Tarek Abdul Hamid Digital Engineering

Converting FP Decimal to Binary
 Convert –0.8125 to binary in single and double precision
 Solution:
 Fraction bits can be obtained using multiplication by 2
 0.8125 × 2 = 1.625
 0.625 × 2 = 1.25
 0.25 × 2 = 0.5 0.8125 = (0.1101)2 = ½ + ¼ + 1/16 = 13/16
 0.5 × 2 = 1.0
 Stop when fractional part is 0
 Fraction = (0.1101)2 = (1.101)2 × 2 –1 (Normalized)
 Exponent = –1 + Bias = 126 (single precision) and 1022 (double)
Single
10111111010100000000000000000000
Precision
10111111111010100000000000000000 Double
Precision
00000000000000000000000000000000
11 Dr. Tarek Abdul Hamid Digital Engineering
Largest Normalized Float
 What is the Largest normalized float?
 Solution for Single Precision:
01111111011111111111111111111111
 Exponent – bias = 254 – 127 = 127 (largest exponent for SP)
 Significand = (1.111 … 1)2 = almost 2
 Value in decimal ≈ 2 × 2127 ≈ 2128 ≈ 3.4028 … × 1038
 Solution for Double Precision:
01111111111011111111111111111111
11111111111111111111111111111111
 Value in decimal ≈ 2 × 21023 ≈ 21024 ≈ 1.79769 … × 10308
 Overflow: exponent is too large to fit in the exponent field

12 Dr. Tarek Abdul Hamid Digital Engineering

Smallest Normalized Float
 What is the smallest (in absolute value) normalized float?
 Solution for Single Precision:
00000000100000000000000000000000
 Exponent – bias = 1 – 127 = –126 (smallest exponent for SP)
 Significand = (1.000 … 0)2 = 1
 Value in decimal = 1 × 2–126 = 1.17549 … × 10–38
 Solution for Double Precision:
00000000000100000000000000000000
00000000000000000000000000000000
 Value in decimal = 1 × 2–1022 = 2.22507 … × 10–308
 Underflow: exponent is too small to fit in exponent field

13 Dr. Tarek Abdul Hamid Digital Engineering

Worksheet 1 Whole Numbers Revision Grade 8 Mathematics
No ratings yet
Worksheet 1 Whole Numbers Revision Grade 8 Mathematics
3 pages
Chapter 6 Square and Square Roots
100% (1)
Chapter 6 Square and Square Roots
79 pages
A Level ZIMSEC Computer Science Notes
No ratings yet
A Level ZIMSEC Computer Science Notes
10 pages
Ieee 754 F P R: Loating Oint Epresentation
No ratings yet
Ieee 754 F P R: Loating Oint Epresentation
11 pages
L1 FloatingPointNumbers Intro
No ratings yet
L1 FloatingPointNumbers Intro
17 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Indices & Surds
No ratings yet
Indices & Surds
7 pages
Seven Smarandache-Coman Sequences of Primes
No ratings yet
Seven Smarandache-Coman Sequences of Primes
6 pages
Lec5 ch3
No ratings yet
Lec5 ch3
5 pages
Integers Multiplication - 1212 - 1212!0!001
No ratings yet
Integers Multiplication - 1212 - 1212!0!001
2 pages
Lec-4 ALU FloatingPoint CompArch Wali
No ratings yet
Lec-4 ALU FloatingPoint CompArch Wali
17 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
DAY 25 Number System Part 02 (Practise Sheet)
No ratings yet
DAY 25 Number System Part 02 (Practise Sheet)
3 pages
ICT Question Paper
No ratings yet
ICT Question Paper
2 pages
Maths Chapter 1 Questions
No ratings yet
Maths Chapter 1 Questions
3 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
Worksheet 3 Common Fractions Grade 9 Maths
No ratings yet
Worksheet 3 Common Fractions Grade 9 Maths
2 pages
Lecture5 COA
No ratings yet
Lecture5 COA
26 pages
Asembly Language
No ratings yet
Asembly Language
42 pages
Mathematical Studies SL Answers Peter Blythe Jim Fensom Jane Forrest Paula PDF Download
No ratings yet
Mathematical Studies SL Answers Peter Blythe Jim Fensom Jane Forrest Paula PDF Download
52 pages
CSA Practical List (Python)
No ratings yet
CSA Practical List (Python)
10 pages
Floating Point Arithmetic
100% (1)
Floating Point Arithmetic
30 pages
True/False Question
No ratings yet
True/False Question
3 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
Square Root Tricks
No ratings yet
Square Root Tricks
9 pages
Cse 321 4 5
No ratings yet
Cse 321 4 5
11 pages
Greenhill Primary School-Buwaate P.6 Mathematics Week 2
No ratings yet
Greenhill Primary School-Buwaate P.6 Mathematics Week 2
16 pages
Revision Surds
No ratings yet
Revision Surds
21 pages
Numerical Methods Chap1
No ratings yet
Numerical Methods Chap1
14 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
55 pages
Addition and Subtraction of Dissimilar Fractions Mathematics Presentation - 20241104 - 135902 - 0000
No ratings yet
Addition and Subtraction of Dissimilar Fractions Mathematics Presentation - 20241104 - 135902 - 0000
13 pages
Solutions
No ratings yet
Solutions
12 pages
Lecture5 - Arithmetic For Computers - Part 2
No ratings yet
Lecture5 - Arithmetic For Computers - Part 2
57 pages
Module 2 Fractions
No ratings yet
Module 2 Fractions
11 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
Prime & Composite Exit Ticketss
No ratings yet
Prime & Composite Exit Ticketss
8 pages
HW-1, LOGICd
No ratings yet
HW-1, LOGICd
9 pages
Floating Point Arithmetic Class
No ratings yet
Floating Point Arithmetic Class
24 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
Slide 02A - Number Systems
No ratings yet
Slide 02A - Number Systems
10 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Quantitative Cheat Sheet
No ratings yet
Quantitative Cheat Sheet
6 pages
CLASS - V Maths Lesson 10 B 2021
No ratings yet
CLASS - V Maths Lesson 10 B 2021
9 pages
Surds and Indices-2
No ratings yet
Surds and Indices-2
4 pages
Floating Point
No ratings yet
Floating Point
13 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
COA - Unit2 Floating Point Arithmetic 3
No ratings yet
COA - Unit2 Floating Point Arithmetic 3
19 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
DSP48E Efficient Floating Point Multiplier Architectures On FPGA
No ratings yet
DSP48E Efficient Floating Point Multiplier Architectures On FPGA
6 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
Floating Point Numbers 237045407 237045407
No ratings yet
Floating Point Numbers 237045407 237045407
20 pages
T2 Homework 2
No ratings yet
T2 Homework 2
3 pages
Booth and Radix-4 Questions
No ratings yet
Booth and Radix-4 Questions
8 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
Unix Ass 2
No ratings yet
Unix Ass 2
4 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
Criterion B Summative Assessment Unit 1 Patterns With Fractions
No ratings yet
Criterion B Summative Assessment Unit 1 Patterns With Fractions
7 pages
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
No ratings yet
16-Algorithms For Floating Point Arithmetic Operations and Numericals-01-02-2024
21 pages
The Real Number System Notes PDF
No ratings yet
The Real Number System Notes PDF
4 pages
Ch. 2 Floating Point Numbers: Representation
No ratings yet
Ch. 2 Floating Point Numbers: Representation
16 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
4.4 - 1 New Floating Point
No ratings yet
4.4 - 1 New Floating Point
22 pages
Week8 Slides
No ratings yet
Week8 Slides
43 pages
Floating Point
No ratings yet
Floating Point
33 pages
Floating Point Representation Examples
No ratings yet
Floating Point Representation Examples
2 pages
COA - Unit2 Floating Point Arithmetic 2
No ratings yet
COA - Unit2 Floating Point Arithmetic 2
67 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Chap2 Float
No ratings yet
Chap2 Float
20 pages
01.07 Rational Expressions
No ratings yet
01.07 Rational Expressions
2 pages
Floating Point Representation - M.eng Term Paper
No ratings yet
Floating Point Representation - M.eng Term Paper
6 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Floating Points
No ratings yet
Floating Points
31 pages
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
No ratings yet
Floating Point Representation of Data: By-Astha Jain Class-It1 0827IT171019
16 pages
Floating Point: - We Need A Way To Represent
No ratings yet
Floating Point: - We Need A Way To Represent
14 pages
Fixed & Floating Point
No ratings yet
Fixed & Floating Point
31 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Complete Floating Point (Blog)
No ratings yet
Complete Floating Point (Blog)
18 pages
DSP Arithmetic
No ratings yet
DSP Arithmetic
33 pages
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet

Lecture 02 - Floating Point Arithmetic

Uploaded by

Lecture 02 - Floating Point Arithmetic

Uploaded by

Digital Engineering

Lecture 02 - Floating Point Arithmetic

2 Dr. Tarek Abdul Hamid Digital Engineering

3 Dr. Tarek Abdul Hamid Digital Engineering

Value of a floating-point number = (-1)S × val(F) × 2val(E)

4 Dr. Tarek Abdul Hamid Digital Engineering

 Double Precision Floating Point Numbers (64 bits)

(–1)S × (1.F)2 × 2val(E)

6 Dr. Tarek Abdul Hamid Digital Engineering

7 Dr. Tarek Abdul Hamid Digital Engineering

(–1)S × (1.F)2 × 2E – Bias

8 Dr. Tarek Abdul Hamid Digital Engineering

 Do it yourself! (answer should be –1.5 × 2–7 = –0.01171875)

10 Dr. Tarek Abdul Hamid Digital Engineering

12 Dr. Tarek Abdul Hamid Digital Engineering

13 Dr. Tarek Abdul Hamid Digital Engineering

You might also like