0% found this document useful (0 votes)

19 views29 pages

ENSC254 - Floating Point Computation

Uploaded by

Robert Ho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views29 pages

ENSC254 - Floating Point Computation

Uploaded by

Robert Ho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

ENSC254 – Floating Point Computation

Ensc254

School of Engineering Science

Simon Fraser University
Burnaby, BC, Canada
Floating Point Operations

• Integer computation is the choice of reference for most signal processing

applications (Sound / Video / Telecommunications) and embedded processors
have been mostly focusing on integers

• Floating point numbers come into play when we are interested in a large range of
values

• Traditionally, floating point algorithms have been manually ported to the integer
domain for embedded systems. BUT, as the complexity of embedded systems
and their performance increase exponentially, floating point computations are
beginning to be increasingly common

• Floating point computation can be applied with two strategies

• HARDWARE IMPLEMENTATION
• EMULATION

2
IEEE 754 - 2008

• FP calculation can be performed in hardware in different ways and precisions,

that affect significantly chip area and power consumption

• In order to clarify what expectations a customer should have from a

computer, IEEE defined a reference standard document, 754-2008

• This standard specifies interchange and arithmetic formats and methods for
floating-point arithmetic in computer environments.
• It specifies exception conditions and their default handling.
• An implementation of a floating-point system conforming to this standard may
be realized entirely in software/firmware, entirely in hardware, or in any
combination of software/firmware and hardware.

3
Floating Point in C environment

• IEEE 754-2008 defines 4 binary FP formats: 16, 32, 64 and 128 bits
(half precision, single precision, double precision and quad precision)

• C, C++ and Java use the following type formats:

• 32-bit (FLOAT)
• 64-bit (DOUBLE)

• ARM supports the 16-bit format in storage, and 32-bit in computation

[but one needs to activate such support with a compiler flag]

4
Floating point Support in ARM

FLOATING POINT
Manual Hardware
Porting Software
Libraries

• Support of FP on ARM has followed the market of embedded systems:

• The very successful ARM7TDMI architecture, does not have any hardware Floating
Point support
• As the need for floating point application emerged from MATLAB into embedded
systems, users started to make manual porting by hand
• Soon, the cost of porting became unbearable, and to avoid losing market ARM started
to work on emulation libraries, that would efficiently translate software FP operations
into integer HW operations using the GP registers as operands
• As FP applications become increasingly common in embedded systems, the cost of
emulation becomes too severe, and a HW FPU starts appearing in ARM9 and ARM11
5
FP Numbers representation

Floating point numbers are represented in HW according to the following equation:

𝐹 = −1 𝑠 ∗ 2𝑒𝑥𝑝−𝑏𝑖𝑎𝑠 ∗ 1. 𝑓

31 30 23 22 0

s exponent Fraction

• Bias=Fixed value depending on the format, 127 for single precision

• Exp ranges from 1 to 254, and the representable EXP from -126 to 127
• Exp= 0 and Exp=255 have a special meaning and do not represent numbers
• The part 1.f is called “Significand” and MUST be by definition between 1 and 2

See Hohl textbook page 180 for analogous 16 and 64 bit formats

6
Example from textbook (Page 181)

Describe the single-precision representation of 6.5:

S=0 (Positive number)

We need to find a representation of 6.5 between 1 and 2 to comply to the
format of the significand [1.f] in the previous slide

• 6.5/2 = 3.25
• 6.5/4 = 1.625

=> 6.5 = −1 0 ∗ 22 ∗ 1.625 ⇒ 𝑠 = 0, 𝑒𝑥𝑝 − 𝑏𝑖𝑎𝑠 = 2 ⇒ 𝑒𝑥𝑝 = 129 = 0𝑥81

1 1
=> 0.625 = 2 + 8 ⇒ f = 101

Result=0b0|100 0000 1|101 0000 0000 0000 0000 0000 = 0x40D00000

7
Example from Textbook

Describe the single-precision representation of -0.4375:

8
Example from Textbook

Describe the single-precision representation of -0.4375 :

S=1 (Negative number)

We need to find a representation of 0.4375 between 1 and 2 to comply to the
format of the significand [1.f] in the previous slide

• 0.4375/2−1 =0.875
• 0.4375/2−2 =1.75 => exp-bias=-2

=> −0.4375 = −1 1 ∗ 2−2 ∗ 1.75 ⇒ 𝑠 = 1, 𝑒𝑥𝑝 − 𝑏𝑖𝑎𝑠 = −2 ⇒ 𝑒𝑥𝑝 = 125 = 0𝑥7𝑑

1 1
=> 0.75 = 2 + 4 ⇒ f = 11

Result=0b1|011 1110 1|110 0000 0000 0000 0000 = 0xBEE00000

9
Range of FP Numbers

• Floating point numbers cover a much larger range as opposed to integers, but
they are represented with the same amount of bits (32 for single precision)

• This means that they can be less accurate: there can be a higher distance
between every single element and the following element, leading to possible
ERRORS in the REPRESENTATION

10
Accuracy of Floating Point Numbers

• One important thing to notice is that the FP representation imposes limitation on

the “Precision” of FP numbers

𝑠
𝐹 = −1 ∗ 2𝑒𝑥𝑝−𝑏𝑖𝑎𝑠 ∗ 1. 𝑓

• The precision of a binary representation is the distance between two

successive numbers: If the number is defined by the product of its binary
representation times an exponent, then the distance between two consecutive FP
Numbers depends on their exponent:
• The larger the exponent, the larger would be the distance between two
consecutive representations, so the smaller would be its precision

11
De-normal / Subnormal Numbers

• A NORMAL FP encoding is an encoding whose significand is assumed to

have a 1 value in front, such as the case we just introduced.
• The smallest positive number available with FP representation is

𝐹 = 2−126 ∗ 1.0 = 1.18 ∗ 10−38

If we want to represent even smaller numbers we can use De-Normalized

numbers (a.k.a. Sub-Normal), that are defined by the formula

𝑠
𝐹 = −1 ∗ 2−126 ∗ 0. 𝑓

In this case we can add 223 more numbers to the list of representable floats.
Sub-normal numbers are indicated by an unbiased exponent of zero, that is all
numbers with exp=0 are considered de-normalized.

12
Zero and Infinity

• FP representation includes two zero (+/-0) and two infinity values (+/-∞)
Infinity is considered as a mathematical concept, and NOT as the
maximum representable value!

• 0 is represented as a number with exponent of 0 and significand of 0

• ∞ is represented as a number with an exponent of all 1s, and
significand of 0

31 30 23 22 0

s exponent Fraction

13
Not a Number (NaN)

• NaN can be used to describe different configurations:

• Result in the presence of unexpected conditions during computation (in this
case the condition is specified in the significand – we call this a “signalling”
NaN )
• Default value for registers not initialized (“quiet” NaN)

• IEEE 754-2008 imposes that the result of any operation involving a NaN is a
NaN

• NaN are encoded with exponent of all ‘1’s and non zero significand

14
Implementation of FP Operations in ARM

• ARM7TDMI has no Hardware FP support. FP calculations are computed using

software emulation and FP numbers are mapped over integer registers

• Latest architectures from ARM such as the Cortex allow the use of a Hardware
FP calculation

• FP operations (Sum, Subtraction, Multiplication, Division) tend to be very

demanding from a hardware point of view, so they COULD NOT POSSIBLY
FIT in the processor pipeline
• For this reason, historically in most processor architectures FP calculations are
performed by a CO-PROCESSOR

15
Co-Processor Computation

microprocessor
Co-Processor

Operands
rfile
rfile
Results

• Performing FP operation IN the main processor pipeline will require long

operations, that reserve for several cycles processor registers and disrupt the
program flow
• It is instead convenient to reserve a specific co-processor for FP operations,
with an independent register file. This is called Floating Point Unit, FPU

16
Coprocessor vs Memory mapped peripheral

• There are two ways to map a computation peripheral on to a processor

architecture: Memory Mapped Peripheral or Co-Processor

• Ex: Suppose we want to use a Hardware divider [note an integer divider is

almost never present in Embedded processors, while a FP divider is much more
common]:
• We can map the divider as a peripheral on the BUS:
bus
z=x/y;
LDR R4,=base_addr_divider
STR R0, [r4]
CPU DIVIDER
STR R1, [r4,#4]
[....wait necessary time ....]
LDR R2,[r4,#8]

• Memory mapped peripherals are loosely coupled to the CPU. They have NO
IMPACT on the CPU architecture and ISA, that sees the divider as a part of the
memory. But memory accesses can be demanding in terms of cycle time

17
Coprocessor vs Memory mapped peripheral (2)

• Alternatively, we can map the divider as co-processor. In this configuration, the

coprocessor is a separate datapath, with independent pipeline, but there are
specific instructions in the ISA moving data back and forth from the coprocessor
and starting computations. The co-processor is tightly coupled to the CPU

bus
z=x/y;
MTC MTC CR0, r4
MTC CR1, r5
CPU MFC
DIVIDER COP.div CR2,CR0,CR1
[....wait necessary time ....]
MFC CR2,r2

MTC=Move to Coprocessor
MFC=Move from Coprocessor

• In this case, it is customary for the coprocessor to have an internal register file to
store temporary values internally minimizing transfers to/from the main CPU

18
Floating Points Units

• Many processor architectures (e.g. MIPS) use a Floating Point coprocessor

• The case of the ARM Cortex Floating Point Unit is very similar: Floating point
operations (where available) are deployed on a separate data path that is tightly
coupled to the main CPU

• The FPU is implemented as a coprocessor: the FPU has an independent

Register file composed of 32 registers of 32 bit each.
• 1 FP register stores one single precision FP number
• 2 FP registers can be used to represent a double precision FP number
• The upper/lower 16 bits of a FP register can be used to store a half-
precision number

• The ARM FPU also has two control FP registers: FPCSR (Floating Point
Status and Control Register) and CPACR (Coprocessor Access Control
Register)

19
Cortex FP Registers

• The 32 GP registers are “flat”: there is no specific usage such as in the case of
integer registers, they can all be used interchangeably. The mnemonics s#
(single precision) or d# (double precision) are used instead of r# to
indicate floating point registers

• FPCSR (Floating Point Status and Control Register) is the equivalent of the
integer CPSR, and stores operation information:

• CPACR (Coprocessor Access Control Register) holds information about

access permission to the available coprocessor slots (0-15 in Cortex). The
FPU is at slot 10 (Single precision) or 11 (Double precision)
• Note: Physically the FPU is one coprocessor, but since the slot number is
added to any coprocessor instruction, the different slot number is used to
specify to the FPU the type of operation expected

20
Loading FP Registers

• Floating Point numbers can be loaded on the FPU registers in two

ways

Directly from memory, where From an ARM GP register. In this

they are stored in FP format case, the FPU will automatically
convert them from the integer to
FP format (possibly losing
information)

21
Loading/Storing data to/from the FPU

VLDR / VSTR.32, S#, [addressing Mode]

VLDR / VSTR.64, D#, [addressing Mode]

bus • Compared to most other processor

architectures, ARM is peculiar because it
allows a coprocessor to load data directly
from memory.
MTC
• Most processors load data from memory to
CPU MFC FPU a GP register, and then from GP register to
FP registers
• Note that the registers used to address
memory are in the CPU and *NOT* in the
FPU !!!

22
Moving FP data between GP and FP Registers

• Data transfers between GP and FP registers are implemented by the

move to/from coprocessor instructions

VMOV.f32 S#, R#
VMOV.f32 R#, S#
VMOV.f32 S#, S#
VMOV.f32 S#, immed

• NOTE: VMOV, VLDR, VSTR as well as all other floating point operations are
part of the ARM ISA, so they support all conditional execution suffixes

23
Double Precision FP move

• In order to support double precision arithmetic, we can transfer TWO

registers at the same time. Note: The GP registers can be independent,
but the FP registers must be consecutive

VMOV S#,S#,R#,R#
VMOV R#,R#,S#,S#
VMOV D#, R#,R#
VMOV R#,R#,D#

• Note that the last two are not independent operation, but a different way
(aliasing) of writing the same operation: they correspond to the same
machine code

24
Floating Point Processing Instructions

• All FP Instructions have a similar format:

V<operation>{cond}.F32 dest src1 src2

The most important operations supported by the ARM FPU are

ABS / NEG / ADD / SUB / MUL / MLA / MLS / CMP / DIV / SQRT

25
Format Conversion Instructions

• An additional set of instructions is used to convert numbers between

integer and floating point or between different floating point precision
formats
• Note that normally floats will remain in the same format all through
computation, so these instruction would normally not be utilized,
UNLESS there is a specific type casting in the c code

int a;
Instruction for format conversion:
float b;
main()
{ b=(float) a;} VCVTB, VCVTT, VCVT

26
Tutorial 1: Disassembling a simple FPU Code

Cortex M4 Version
C Code:
float a,b=2.3,c=3.4;
main()
{a=b+c;}

Two’s complement representation of the input floats

ARM7TDMI Version

Emulation
Function

27
Tutorial 1: Profiling Information

Two cycles per instruction

Single cycle

ARM Cortex M4 (Using FPU): Total Time 13 cycles

28
Tutorial 2: Disassembling an FP Division

C Code:
float a,b=2.3,c=3.4;
main() 14 cycles
{a=c/b;}

Profiling: Div=14 cycles, all other operations 2 cycles, total 26 cycles

ARM Cortex M4 (Using FPU): Total Time 26 cycles

Note: For your reference, a SP mul operation takes one cycle, a MLA 3 cycles

Remote Procedure Call in Distributed System
No ratings yet
Remote Procedure Call in Distributed System
26 pages
Sum Tool
No ratings yet
Sum Tool
73 pages
Cutover Master File
No ratings yet
Cutover Master File
4 pages
869df5e47080aefbed69911b1e47cf27
No ratings yet
869df5e47080aefbed69911b1e47cf27
500 pages
ML System Optimization Lecture 11 Quantization
No ratings yet
ML System Optimization Lecture 11 Quantization
150 pages
Red Hat Enterprise Linux 7: Kernel Administration Guide
100% (1)
Red Hat Enterprise Linux 7: Kernel Administration Guide
98 pages
Unit 2
No ratings yet
Unit 2
16 pages
Lab 1
100% (1)
Lab 1
10 pages
Arduino in A Nutshell 1.6
No ratings yet
Arduino in A Nutshell 1.6
20 pages
405 - 1200 - 048 - 2 - Commissioning Manual - Rel 2 - 00
No ratings yet
405 - 1200 - 048 - 2 - Commissioning Manual - Rel 2 - 00
146 pages
AWS Solutions Architect Cheat Sheet Feb 2025
No ratings yet
AWS Solutions Architect Cheat Sheet Feb 2025
65 pages
Class 5 Worksheet 2
No ratings yet
Class 5 Worksheet 2
3 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
30 pages
"The Course That Gives CMU Its Zip!": Topics
No ratings yet
"The Course That Gives CMU Its Zip!": Topics
31 pages
Notes On C Language: From: Prof Saroj Kaushik CSE Dept, IIT Delhi
No ratings yet
Notes On C Language: From: Prof Saroj Kaushik CSE Dept, IIT Delhi
55 pages
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
No ratings yet
Lec07 - Computer Arithmetic - Floating-Point Representation and Arithmetic
42 pages
Viva
No ratings yet
Viva
32 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
51 pages
Cacc
No ratings yet
Cacc
106 pages
Floating Point 6up
No ratings yet
Floating Point 6up
7 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
08 FloatingPoint
No ratings yet
08 FloatingPoint
52 pages
Asembly Language
No ratings yet
Asembly Language
42 pages
M1-R5: Information Technology Tools & Network Basics (January 2021)
No ratings yet
M1-R5: Information Technology Tools & Network Basics (January 2021)
6 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
55 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
Lect4 Floats
No ratings yet
Lect4 Floats
64 pages
Floating Point Numbers: CS101 Introduction To Computing
No ratings yet
Floating Point Numbers: CS101 Introduction To Computing
41 pages
Easy Builder MT 500
No ratings yet
Easy Builder MT 500
242 pages
M3 - T-GCPFCI-B - Core Infrastructure v5.1.0 - ILT
No ratings yet
M3 - T-GCPFCI-B - Core Infrastructure v5.1.0 - ILT
49 pages
Javelin3/Javelin3Pro PDF Readers: Secure PDF Reader Program For The Drumlin Digital Rights Management (DRM) Service
No ratings yet
Javelin3/Javelin3Pro PDF Readers: Secure PDF Reader Program For The Drumlin Digital Rights Management (DRM) Service
27 pages
LEC03 Data II
No ratings yet
LEC03 Data II
45 pages
Class03 cs230s22
No ratings yet
Class03 cs230s22
33 pages
COA UNIT-III PPTs Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-III PPTs Dr.G.Bhaskar ECE
64 pages
Floating Point
No ratings yet
Floating Point
33 pages
2.4 Floating Points
No ratings yet
2.4 Floating Points
36 pages
4 Floating Point Inclass
No ratings yet
4 Floating Point Inclass
33 pages
CH03 Data II
No ratings yet
CH03 Data II
31 pages
TechSpace (Vol-4, Issue-02) PDF
No ratings yet
TechSpace (Vol-4, Issue-02) PDF
48 pages
5 Data - Floating - Point v1
No ratings yet
5 Data - Floating - Point v1
25 pages
Pooja Vashisth
No ratings yet
Pooja Vashisth
35 pages
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
No ratings yet
Cosc 2150: Computer Organization: Chapter 9, Part 3 Floating Point Numbers
39 pages
Chapter2 2.5
No ratings yet
Chapter2 2.5
34 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Fixed Point Numbers
No ratings yet
Fixed Point Numbers
20 pages
ELEC2041 Microprocessors and Interfacing Lectures 21: Floating Point Number Representation - III
No ratings yet
ELEC2041 Microprocessors and Interfacing Lectures 21: Floating Point Number Representation - III
31 pages
Floating Point Arithmetic: Numbers
No ratings yet
Floating Point Arithmetic: Numbers
14 pages
15 - Floating Point Encoding
No ratings yet
15 - Floating Point Encoding
17 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Digital Signal Processing: Date: 17/08/2017
No ratings yet
Digital Signal Processing: Date: 17/08/2017
27 pages
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
No ratings yet
ELEC2041 Microprocessors and Interfacing Lectures 19: Floating Point Number Representation - I
24 pages
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
No ratings yet
Lecture 14 - Arithmetic Subsystems - Numbering Systems and Floating Point Unit (FPU)
32 pages
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
No ratings yet
EE 109 Unit 20: IEEE 754 Floating Point Representation Floating Point Arithmetic
31 pages
Ece552 10 Floating Point
No ratings yet
Ece552 10 Floating Point
15 pages
Lecture 06 - MIPS Floating Point Arithmetic
No ratings yet
Lecture 06 - MIPS Floating Point Arithmetic
23 pages
Module 2 - PART D Floating
No ratings yet
Module 2 - PART D Floating
30 pages
Chapter3 3
No ratings yet
Chapter3 3
13 pages
Floating Points
No ratings yet
Floating Points
31 pages
Lab 7
No ratings yet
Lab 7
9 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
FH190SP
No ratings yet
FH190SP
69 pages
Lab 7
No ratings yet
Lab 7
11 pages
COMP0068 Lecture10 High Level Data Types
No ratings yet
COMP0068 Lecture10 High Level Data Types
25 pages
Computer Notes
No ratings yet
Computer Notes
78 pages
HP Compaq Presario CQ61 Quanta OP8 Schematic Diagram 1A
No ratings yet
HP Compaq Presario CQ61 Quanta OP8 Schematic Diagram 1A
40 pages
8.3 Floating Point Numbers
No ratings yet
8.3 Floating Point Numbers
19 pages
Floating Point Representation of Numbers: Wide Range
No ratings yet
Floating Point Representation of Numbers: Wide Range
11 pages
ENSC 350: Digital Systems Design: Instructor: Dr. Ameer Abdelhadi
No ratings yet
ENSC 350: Digital Systems Design: Instructor: Dr. Ameer Abdelhadi
56 pages
DSP Arithmetic
No ratings yet
DSP Arithmetic
33 pages
GRIDScaler Datasheet
No ratings yet
GRIDScaler Datasheet
2 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
No ratings yet
The World Is Not Just Integers: Programming Languages Support Numbers With Fraction
4 pages
Cloud Computing
No ratings yet
Cloud Computing
15 pages
Ieee Tex
No ratings yet
Ieee Tex
4 pages
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
No ratings yet
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
8 pages
EC-502 - Aritra Dutta
No ratings yet
EC-502 - Aritra Dutta
6 pages
Digital Design With HDL
No ratings yet
Digital Design With HDL
29 pages
Chapter 5 Classified MS
No ratings yet
Chapter 5 Classified MS
11 pages
Double-Precision Floating-Point Format - Wikipedia
No ratings yet
Double-Precision Floating-Point Format - Wikipedia
8 pages
Ijspr 1203 438
No ratings yet
Ijspr 1203 438
4 pages
412 ENSC%2B320%2BElectric%2BCircuits%2BII Spring2015 v5
No ratings yet
412 ENSC%2B320%2BElectric%2BCircuits%2BII Spring2015 v5
22 pages
Design & Simulation of 32-Bit Floating Point Alu
No ratings yet
Design & Simulation of 32-Bit Floating Point Alu
3 pages
Design & Simulation of 32-Bit Floating Point Alu
No ratings yet
Design & Simulation of 32-Bit Floating Point Alu
3 pages
Asiahorse Technology Co., LTD
No ratings yet
Asiahorse Technology Co., LTD
12 pages
2019.12 Octave - Configuration Guide
No ratings yet
2019.12 Octave - Configuration Guide
9 pages
OLP3237 ERL1 Gen5
No ratings yet
OLP3237 ERL1 Gen5
7 pages
20ME5602
No ratings yet
20ME5602
2 pages
W 9 Nptelct
No ratings yet
W 9 Nptelct
3 pages
DIVAR IP All in One Data Sheet enUS 85245878923
No ratings yet
DIVAR IP All in One Data Sheet enUS 85245878923
5 pages
RoadFocus LED Cobra Head - Small RFS Spec Sheet
No ratings yet
RoadFocus LED Cobra Head - Small RFS Spec Sheet
5 pages
Ririn Review 4
No ratings yet
Ririn Review 4
5 pages
ENSC 351: Real-Time and Embedded Systems: Craig Scratchley, Spring 2014 Multipart Project Part 2A
No ratings yet
ENSC 351: Real-Time and Embedded Systems: Craig Scratchley, Spring 2014 Multipart Project Part 2A
3 pages
Belden 22640
No ratings yet
Belden 22640
2 pages
What To Bring To Camp-1
No ratings yet
What To Bring To Camp-1
1 page
Leaks Va1 Source Build Sample10 2hr
No ratings yet
Leaks Va1 Source Build Sample10 2hr
1 page

ENSC254 - Floating Point Computation

Uploaded by

ENSC254 - Floating Point Computation

Uploaded by

ENSC254 – Floating Point Computation

School of Engineering Science

• Integer computation is the choice of reference for most signal processing

• Floating point computation can be applied with two strategies

• FP calculation can be performed in hardware in different ways and precisions,

• In order to clarify what expectations a customer should have from a

• C, C++ and Java use the following type formats:

• ARM supports the 16-bit format in storage, and 32-bit in computation

• Support of FP on ARM has followed the market of embedded systems:

Floating point numbers are represented in HW according to the following equation:

• Bias=Fixed value depending on the format, 127 for single precision

Describe the single-precision representation of 6.5:

S=0 (Positive number)

=> 6.5 = −1 0 ∗ 22 ∗ 1.625 ⇒ 𝑠 = 0, 𝑒𝑥𝑝 − 𝑏𝑖𝑎𝑠 = 2 ⇒ 𝑒𝑥𝑝 = 129 = 0𝑥81

Result=0b0|100 0000 1|101 0000 0000 0000 0000 0000 = 0x40D00000

Describe the single-precision representation of -0.4375:

Describe the single-precision representation of -0.4375 :

S=1 (Negative number)

=> −0.4375 = −1 1 ∗ 2−2 ∗ 1.75 ⇒ 𝑠 = 1, 𝑒𝑥𝑝 − 𝑏𝑖𝑎𝑠 = −2 ⇒ 𝑒𝑥𝑝 = 125 = 0𝑥7𝑑

Result=0b1|011 1110 1|110 0000 0000 0000 0000 = 0xBEE00000

• One important thing to notice is that the FP representation imposes limitation on

• The precision of a binary representation is the distance between two

• A NORMAL FP encoding is an encoding whose significand is assumed to

𝐹 = 2−126 ∗ 1.0 = 1.18 ∗ 10−38

If we want to represent even smaller numbers we can use De-Normalized

• 0 is represented as a number with exponent of 0 and significand of 0

• NaN can be used to describe different configurations:

• ARM7TDMI has no Hardware FP support. FP calculations are computed using

• FP operations (Sum, Subtraction, Multiplication, Division) tend to be very

• Performing FP operation IN the main processor pipeline will require long

• There are two ways to map a computation peripheral on to a processor

• Ex: Suppose we want to use a Hardware divider [note an integer divider is

• Alternatively, we can map the divider as co-processor. In this configuration, the

• Many processor architectures (e.g. MIPS) use a Floating Point coprocessor

• The FPU is implemented as a coprocessor: the FPU has an independent

• CPACR (Coprocessor Access Control Register) holds information about

• Floating Point numbers can be loaded on the FPU registers in two

Directly from memory, where From an ARM GP register. In this

VLDR / VSTR.32, S#, [addressing Mode]

bus • Compared to most other processor

• Data transfers between GP and FP registers are implemented by the

• In order to support double precision arithmetic, we can transfer TWO

• All FP Instructions have a similar format:

V<operation>{cond}.F32 dest src1 src2

The most important operations supported by the ARM FPU are

• An additional set of instructions is used to convert numbers between

Two’s complement representation of the input floats

Two cycles per instruction

ARM Cortex M4 (Using FPU): Total Time 13 cycles

Profiling: Div=14 cycles, all other operations 2 cycles, total 26 cycles

You might also like