0% found this document useful (0 votes)

271 views11 pages

Handbook of Floating-Point Arithmetic

This document summarizes a handbook on floating-point arithmetic. It was published in January 2010 and has received over 421 citations and 8,292 reads. The handbook was authored by 9 researchers including Jean-Michel Muller and Florent De Dinechin. It covers topics such as the history of floating-point arithmetic, basic definitions, standards like IEEE 754, properties and algorithms for clever uses of floating-point.

Uploaded by

nielm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

271 views11 pages

Handbook of Floating-Point Arithmetic

Uploaded by

nielm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/47735229

Handbook of Floating-Point Arithmetic

Book · January 2010

DOI: 10.1007/978-0-8176-4705-6 · Source: OAI

CITATIONS READS

421 8,292

9 authors, including:

Jean-Michel Muller Florent De Dinechin

Ecole normale supérieure de Lyon Institut National des Sciences Appliquées de Lyon
307 PUBLICATIONS 4,390 CITATIONS 136 PUBLICATIONS 2,973 CITATIONS

SEE PROFILE SEE PROFILE

Claude-Pierre Jeannerod Vincent Lefèvre

National Institute for Research in Computer Science and Control National Institute for Research in Computer Science and Control
90 PUBLICATIONS 1,354 CITATIONS 78 PUBLICATIONS 2,023 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

GNU MPFR View project

All content following this page was uploaded by Florent De Dinechin on 29 May 2014.

The user has requested enhancement of the downloaded file.

Jean-Michel Muller
Nicolas Brisebarre
Florent de Dinechin
Claude-Pierre Jeannerod
Vincent Lerevre
Guillaume Melquiond
Nathalie Revol
Damien Stehle
Serge Tones

Handbook of
Floating-Point
Arithmetic

Birkhäuser
Boston • Basel • Berlin
Contents

Preface xv
List of Figures xvii
List of Tables xxi

I Introduction, Basic Definitions, and Standards 1

1 Introduction 3
1.1 Some History 3
1.2 Desirable Properties 6
1.3 Some Strange Behaviors 7
1.3.1 Some famous bugs 7
1.3.2 Difficult problems 8

2 Definitions and Basic Notions 13

2.1 Floating-Point Numbers 13
2.2 Rounding 20
2.2.1 Rounding modes 20
2.2.2 Useful properties 22
2.2.3 Relative error due to rounding 23
2.3 Exceptions 25
2.4 Lost or Preserved Properties of the Arithmetic on the Real
Numbers 27
2.5 Note on the Choice of the Radix 29
2.5.1 Representation errors 29
2.5.2 A case for radix 10 30
2.6 Tools for Manipulating Floating-Point Errors 32
2.6.1 The ulp function 32
2.6.2 Errors in ulps and relative errors 37
2.6.3 An example: iterated products 37
2.6.4 Unit roundoff 39
2.7 Note on Radix Conversion 40

vi Contents

2.7.1 Conditions on the formats 40

2.7.2 Conversion algorithms 43
2.8 The Fused Multiply-Add (FMA) Instruction 51
2.9 Interval Arithmetic 51
2.9.1 Intervalls with floating-point bounds 52
2.9.2 Optimized rounding 52

3 Floating-Point Formats and Environment 55

3.1
The IEEE 754-1985 Standard 56
3.1.1 Formats specified by IEEE 754-1985 56
3.1.2 Little-endian, big-endian 60
3.1.3 Rounding modes specified by IEEE 754-1985 61
3.1.4 Operations specified by IEEE 754-1985 62
3.1.5 Exceptions specified by IEEE 754-1985 66
3.1.6 Special values 69
3.2 The IEEE 854-1987 Standard 70
3.2.1 Constraints internal to a format 70
3.2.2 Various formats and the constraints between them . . . 71
3.2.3 Conversions between floating-point numbers and
decimal strings 72
3.2.4 Rounding 73
3.2.5 Operations 73
3.2.6 Comparisons 74
3.2.7 Exceptions 74
3.3 The Need for a Revision 74
3.3.1 A typical problem: "double rounding" 75
3.3.2 Various ambiguities 77
3.4 The New IEEE 754-2008 Standard 79
3.4.1 Formats specified by the revised standard 80
3.4.2 Binary interchange format encodings 81
3.4.3 Decimal interchange format encodings 82
3.4.4 Larger formats 92
3.4.5 Extended and extendable precisions 92
3.4.6 Attributes 93
3.4.7 Operations specified by the standard 97
3.4.8 Comparisons 99
3.4.9 Conversions 99
3.4.10 Default exception handling 100
3.4.11 Recommended transcendental functions 103
3.5 Floating-Point Hardware in Current Processors 104
3.5.1 The common hardware denominator 104
3.5.2 Fused multiply-add 104
3.5.3 Extended precision 104
3.5.4 Rounding and precision control 105

Contents vii

3.5.5 SIMD instructions 106

3.5.6 Floating-point on x86 processors: SSE2 versus x87 . . 106
3.5.7 Decimal arithmetic 107
3.6 Floating-Point Hardware in Recent Graphics Processing Units 108
3.7 Relations with Programming Languages 109
3.7.1 The Language Independent Arithmetic (LIA) standard 109
3.7.2 Programming languages 110
3.8 Checking the Environment 110
3.8.1 MACHAR 111
3.8.2 Paranoia 111
3.8.3 UCBTest 115
3.8.4 TestFloat 116
3.8.5 IeeeCC754 116
3.8.6 Miscellaneous 116

II Cleverly Using Floating Point Arithmetic

- 117

4 Basic Properties and Algorithms 119

4.1 Testing the Computational Environment 119
4.1.1 Computing the radix 119
4.1.2 Computing the precision 121
4.2 Exact Operations 122
4.2.1 Exact addition 122
4.2.2 Exact multiplications and divisions 124
4.3 Accurate Computations of Sums of Two Numbers 125
4.3.1 The Fast2Sum algorithm 126
4.3.2 The 2Sum algorithm 129
4.3.3 If we do not use rounding to nearest 131
4.4 Computation of Products 132
4.4.1 Veltkamp splitting 132
4.4.2 Dekker's multiplication algorithm 135
4.5 Complex numbers 139
4.5.1 Various error bounds 140
4.5.2 Error bound for complex multiplication 141
4.5.3 Complex division 144
4.5.4 Complex square root 149

5 The Fused Multiply-Add Instruction 151

5.1 The 2Mu1tFMA Algorithm 152
5.2 Computation of Residuals of Division and Square Root 153
5.3 Newton—Raphson-Based Division with an FMA 155
5.3.1 Variants of the Newton—Raphson iteration 155
viii Contents

5.3.2 Using the Newton-Raphson iteration for correctly

rounded division 160
5.4 Newton-Raphson-Based Square Root with an FMA 167
5.4.1 The basic iterations 167
5.4.2 Using the Newton-Raphson iteration for correctly
rounded square roots 168
5.5 Multiplication by an Arbitrary-Precision Constant 171
5.5.1 Checking for a given constant C if Algorithm 5.2 will
always work 172
5.6 Evaluation of the Error of an FMA 175
5.7 Evaluation of Integer Powers 177

6 Enhanced Floating-Point Sums, Dot Products, and Polynomial

Values 181
6.1 Preliminaries 182
6.1.1 Floating-point arithmetic models 183
6.1.2 Notation for error analysis and classical error estimates 184
6.1.3 Properties for deriving running error bounds 187
6.2 Computing Validated Running Error Bounds 188
6.3 Computing Sums More Accurately 190
6.3.1 Reordering the operands, and a bit more 190
6.3.2 Compensated sums 192
6.3.3 Implementing a "long accumulator" 199
6.3.4 On the sum of three floating-point numbers 199
6.4 Compensated Dot Products 201
6.5 Compensated Polynomial Evaluation 203

7 Languages and Compilers 205

7.1 A Play with Many Actors 205
7.1.1 Floating-point evaluation in programming languages . 206
7.1.2 Processors, compilers, and operating systems 208
7.1.3 In the hands of the programmer 209
7.2 Floating Point in the C Language 209
7.2.1 Standard C99 headers and IEEE 754-1985 support . 209
7.2.2 Types 210
7.2.3 Expression evaluation 213
7.2.4 Code transformations 216
7.2.5 Enabling unsafe optimizations 217
7.2.6 Summary: a few horror stories 218
7.3 Floating-Point Arithmetic in the C++ Language 220
7.3.1 Semantics 220
7.3.2 Numeric limits 221
7.3.3 Overloaded functions 222
7.4 FORTRAN Floating Point in a Nutshell 223
Contents ix

7.4.1 Philosophy 223

7.4.2 IEEE 754 support in FORTRAN 226
7.5 Java Floating Point in a Nutshell 227
7.5.1 Philosophy 227
7.5.2 Types and classes 228
7.5.3 Infinities, NaNs, and signed zeros 230
7.5.4 Missing features 231
7.5.5 Reproducibility 232
7.5.6 The BigDecimal package 233
7.6 Conclusion 234

III Implementing Floating Point Operators

- 237
8 Algorithms for the Five Basic Operations 239
8.1 Overview of Basic Operation Implementation 239
8.2 Implementing IEEE 754-2008 Rounding 241
8.2.1 Rounding a nonzero finite value with unbounded
exponent range 241
8.2.2 Overflow 243
8.2.3 Underflow and subnormal results 244
8.2.4 The inexact excep tion 245
8.2.5 Rounding for actual operations 245
8.3 Floating-Point Addition and Subtraction 246
8.3.1 Decimal addition 249
8.3.2 Decimal addition using binary encoding 250
8.3.3 Subnormal inputs and outputs in binary addition . . 251
8.4 Floating-Point Multiplication 251
8.4.1 Normal case 252
8.4.2 Handling subnormal numbers in binary multiplication 252
8.4.3 Decimal specifics 253
8.5 Floating-Point Fused Multiply-Add 254
8.5.1 Case analysis for normal inputs 254
8.5.2 Handling subnormal inputs 258
8.5.3 Handling decimal cohorts 259
8.5.4 Overview of a binary FMA implementation 259
8.6 Floating-Point Division 262
8.6.1 Overview and special cases 262
8.6.2 Computing the significand quotient 263
8.6.3 Managing subnormal numbers 264
8.6.4 The inexact exception 265
8.6.5 Decimal specifics 265
8.7 Floating-Point Square Root 265
8.7.1 Overview and special cases 265
Contents

8.7.2 Computing the significand square root 266

8.7.3 Managing subnormal numbers 267
8.7.4 The inexact exception 267
8.7.5 Decimal specifics 267

9 Hardware Implementation of Floating-Point Arithmetic 269

9.1 Introduction and Context 269
9.1.1 Processor internal formats 269
9.1.2 Hardware handling of subnormal numbers 270
9.1.3 Full-custom VLSI versus reconfigurable circuits 271
9.1.4 Hardware decimal arithmetic 272
9.1.5 Pipelining 273
9.2 The Primitives and Their Cost 274
9.2.1 Integer adders 274
9.2.2 Digit-by-integer multiplication in hardware 280
9.2.3 Using nonstandard representations of numbers 280
9.2.4 Binary integer multiplication 281
9.2.5 Decimal integer multiplication 283
9.2.6 Shifters 284
9.2.7 Leading-zero courtters 284
9.2.8 Tables and table-based methods for fixed-point
function approximation 286
9.3 Binary Floating-Point Addition 288
9.3.1 Overview 288
9.3.2 A first dual-path architecture 289
9.3.3 Leading-zero anticipation 291
9.3.4 Probing further on floating-point adders 295
9.4 Binary Floating-Point Multiplication 296
9.4.1 Basic architecture 296
9.4.2 FPGA implementation 296
9.4.3 VLSI implementation optimized for delay 298
9.4.4 Managing subnormals 301
9.5 Binary Fused Multiply-Add 302
9.5.1 Classic architecture 303
9.5.2 To probe further 305
9.6 Division 305
9.6.1 Digit-recurrence division 306
9.6.2 Decimal division 309
9.7 Conclusion: Beyond the FPU 309
9.7.1 Optimization in context of standard operators 310
9.7.2 Operation with a constant operand 311
9.7.3 Block floating point 313
9.7.4 Specific architectures for accumulation 313
9.7.5 Coarser-grain operators 317
Contents xi

9.8 Probing Further 320

10 Software Implementation of Floating-Point Arithmetic 321

10.1 Implementation Context 322
10.1.1 Standard encoding of binary floating-point data 322
10.1.2 Available integer operators 323
10.1.3 First examples 326
10.1.4 Design choices and optimizations 328
10.2 Binary Floating-Point Addition 329
10.2.1 Handling special values 330
10.2.2 Computing the sign of the result 332
10.2.3 Swapping the operands and computing the alignment
shift 333
10.2.4 Getting the correctly rounded result 335
10.3 Binary Floating-Point Multiplication 341
10.3.1 Handling special values 341
10.3.2 Sign and exponent computation 343
10.3.3 Overflow detection 345
10.3.4 Getting the correctly rounded result 346
10.4 Binary Floating-Point Division 349
10.4.1 Handling special values 350
10.4.2 Sign and exponent computation 351
10.4.3 Overflow detection 354
10.4.4 Getting the correctly rounded result 355
10.5 Binary Floating-Point Square Root 361
10.5.1 Handling special values 362
10.5.2 Exponent computation 364
10.5.3 Getting the correctly rounded result 365

IV Elementary Functions 373

11 Evaluating Floating-Point Elementary Functions 375
11.1 Basic Range Reduction Algorithms 379
11.1.1 Cody and Waite's reduction algorithm 379
11.1.2 Payne and Hanek's algorithm 381
11.2 Bounding the Relative Error of Range Reduction 382
11.3 More Sophisticated Range Reduction Algorithms 384
11.3.1 An example of range reduction for the exponential
function 386
11.3.2 An example of range reduction for the logarithm 387
11.4 Polynomial or Rational Approximations 388
11.4.1 L2 case 389
11.4.2 L" , or minimax case 390
xii Contents

11.4.3 "Truncated" approximations 392

11.5 Evaluating Polynomials 393
11.6 Correct Rounding of Elementary Functions to binary64 . . . 394
11.6.1 The Table Maker's Dilemma and Ziv's onion peeling
strategy 394
11.6.2 When the TMD is solved 395
11.6.3 Rounding test 396
11.6.4 Accurate second step 400
11.6.5 Error analysis and the accuracy/performance tradeoff 401
11.7 Computing Error Bounds 402
11.7.1 The point with efficient code 402
11.7.2 Example: a "double-double" polynomial evaluation . 403
12 Solving the Table Maker's Dilemma 405
12.1 Introduction 405
12.1.1 The Table Maker's Dilemma 406
12.1.2 Brief history of the TMD 410
12.1.3 Organization of the chapter 411
12.2 Preliminary Remarks on the Table Maker's Dilemma 412
12.2.1 Statistical arguments: what can be expected in practice 412
12.2.2 In some domains, there is no need to find worst cases . 416
12.2.3 Deducing the worst cases from other functions or
domains 419
12.3 The Table Maker's Dilemma for Algebraic Functions 420
12.3.1 Algebraic and transcendental numbers and functions 420
12.3.2 The elementary case of quotients 422
12.3.3 Around Liouville's theorem 424
12.3.4 Generating bad rounding cases for the square root
using Hensel 2-adic lifting 425
12.4 Solving the Table Maker's Dilemma for Arbitrary Functions 429
12.4.1 Lindemann's theorem: application to some
transcendental functions 429
12.4.2 A theorem of Nesterenko and Waldschmidt 430
12.4.3 A first method: tabulated differences 432
12.4.4 From the TMD to the distance between a grid and a
segment 434
12.4.5 Linear approximation: Lefevre's algorithm 436
12.4.6 The SLZ algorithm 443
12.4.7 Periodic functions on large arguments 448
12.5 Some Results 449
12.5.1 Worst cases for the exponential, logarithmic,
trigonometric, and hyperbolic functions 449
12.5.2 A special case: integer powers 458
12.6 Current Limits and Perspectives 458
Contents xiii

V Extensions 461
13 Formalisms for Certifying Floating-Point Algorithms 463
13.1 Formalizing Floating-Point Arithmetic 463
13.1.1 Defining floating-point numbers 464
13.1.2 Simplifying the definition 466
13.1.3 Defining rounding operators 467
13.1.4 Extending the set of numbers 470
13.2 Formalisms for Certifying Algorithms by Hand 471
13.2.1 Hardware units 471
13.2.2 Low-level algorithms 472
13.2.3 Advanced algorithms 473
13.3 Automating Proofs 474
13.3.1 Computing on bounds 475
13.3.2 Counting digits 477
13.3.3 Manipulating expressions 479
13.3.4 Handling the relative error 483
13.4 Using Gappa 484
13.4.1 Toy implementation of sine 484
13.4.2 Integer division on Itanium 488
14 Extending the Precision 493
14.1 Double-Words, Triple-Words. 494
14.1.1 Double-word arithmetic 495
14.1.2 Static triple-word arithmetic 498
14.1.3 Quad-word arithmetic 500
14.2 Floating-Point Expansions 503
14.3 Floating-Point Numbers with Batched Additional Exponent 509
14.4 Large Precision Relying on Processor Integers 510
14.4.1 Using arbitrary-precision integer arithmetic for
arbitrary-precision floating-point arithmetic 512
14.4.2 A brief introduction to arbitrary-precision integer
arithmetic 513

VI Perspectives and Appendix 517

15 Conclusion and Perspectives 519

16 Appendix: Number Theory Tools for Floating-Point Arithmetic 521

16.1 Continued Fractions 521
16.2 The LLL Algorithm 524

Bibliography 529

Index 567

View publication stats

Introduction To The Tools of Scientific Computing
100% (2)
Introduction To The Tools of Scientific Computing
429 pages
Zhanghongbo Ocaml Book
No ratings yet
Zhanghongbo Ocaml Book
422 pages
Parallel Algorithms
100% (1)
Parallel Algorithms
348 pages
d2l en
No ratings yet
d2l en
982 pages
ControlEngineeringI PDF
100% (1)
ControlEngineeringI PDF
175 pages
Notes
No ratings yet
Notes
422 pages
BMATS101 - Model QP (Set-2) Solution - 240125 - 163217
100% (1)
BMATS101 - Model QP (Set-2) Solution - 240125 - 163217
37 pages
Maths Class Xi: Limits and Derivatives Practice Paper 02
0% (1)
Maths Class Xi: Limits and Derivatives Practice Paper 02
2 pages
DLL - Wek1 - LC33-35
100% (9)
DLL - Wek1 - LC33-35
14 pages
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
Humble Ruby Book
100% (1)
Humble Ruby Book
141 pages
Barbara Liskov, Programming With Abstract Data Types
100% (1)
Barbara Liskov, Programming With Abstract Data Types
10 pages
Introduction To The Gedit Editor
100% (1)
Introduction To The Gedit Editor
51 pages
Notes Computing
No ratings yet
Notes Computing
201 pages
Design Patterns For Simulations in Erlang
0% (1)
Design Patterns For Simulations in Erlang
72 pages
Power Maths Y3 Answers
No ratings yet
Power Maths Y3 Answers
52 pages
Chapter - 2 Instruction Set Architecture 2.1 Memory Locations and Addresses
No ratings yet
Chapter - 2 Instruction Set Architecture 2.1 Memory Locations and Addresses
11 pages
Programming With Miranda
No ratings yet
Programming With Miranda
312 pages
Integer Factorization
No ratings yet
Integer Factorization
57 pages
Concrete Semantics With Isabelle/HOL
No ratings yet
Concrete Semantics With Isabelle/HOL
308 pages
Data Stru by Chapman and Application
No ratings yet
Data Stru by Chapman and Application
1,321 pages
Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier
No ratings yet
Architecture and Design of Generic IEEE-754 Based Floating Point Adder, Subtractor and Multiplier
5 pages
Module 3 Problem Solving and Reasoning
No ratings yet
Module 3 Problem Solving and Reasoning
31 pages
2020fa CS61C 2020fa Module 2 C PDF
No ratings yet
2020fa CS61C 2020fa Module 2 C PDF
106 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
23 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Cfengine 3 Concepts Guide
No ratings yet
Cfengine 3 Concepts Guide
62 pages
Lecture 13, 14 - Chapter 6 Area Moments of Inertia
No ratings yet
Lecture 13, 14 - Chapter 6 Area Moments of Inertia
31 pages
LLVM
No ratings yet
LLVM
474 pages
Math 23 LE3 Samplex AnsKey 1 1920
No ratings yet
Math 23 LE3 Samplex AnsKey 1 1920
11 pages
Gretl Guide
No ratings yet
Gretl Guide
336 pages
Thesis Hisham PDF
No ratings yet
Thesis Hisham PDF
151 pages
Unit-Iv: Mathematics, Biology and Computers For Chemists
No ratings yet
Unit-Iv: Mathematics, Biology and Computers For Chemists
90 pages
Knapsack Algorithm
No ratings yet
Knapsack Algorithm
9 pages
3.1 Static Random Access Memory (SRAM)
No ratings yet
3.1 Static Random Access Memory (SRAM)
6 pages
InTech-Types of Machine Learning Algorithms
No ratings yet
InTech-Types of Machine Learning Algorithms
30 pages
T Diagrams
100% (1)
T Diagrams
22 pages
ControlEngineeringI PDF
No ratings yet
ControlEngineeringI PDF
175 pages
Fulltext01 PDF
No ratings yet
Fulltext01 PDF
168 pages
Machine-Learning Paradigms
No ratings yet
Machine-Learning Paradigms
32 pages
Create New Language
No ratings yet
Create New Language
26 pages
A Novel Approach To Transform Relational Database Into Graph Database Using Neo4j
No ratings yet
A Novel Approach To Transform Relational Database Into Graph Database Using Neo4j
64 pages
Erlang
No ratings yet
Erlang
67 pages
PLplot-5 3 1
No ratings yet
PLplot-5 3 1
178 pages
Cisc vs. Risc
No ratings yet
Cisc vs. Risc
53 pages
Smart Syntax Highlighting For Dynamic Language Case: Common Lisp in Emacs
No ratings yet
Smart Syntax Highlighting For Dynamic Language Case: Common Lisp in Emacs
61 pages
Target Code Generation: Utkarsh Jaiswal 11CS30038
No ratings yet
Target Code Generation: Utkarsh Jaiswal 11CS30038
15 pages
Compiler Record
No ratings yet
Compiler Record
48 pages
3
No ratings yet
3
14 pages
An Implementation of Smart Contracts by PDF
No ratings yet
An Implementation of Smart Contracts by PDF
9 pages
Dynamic Code Generation With Java Compiler API in Java 6: by Swaminathan Bhaskar 10/10/2009
No ratings yet
Dynamic Code Generation With Java Compiler API in Java 6: by Swaminathan Bhaskar 10/10/2009
14 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
62 pages
Matcont: ACM Transactions On Mathematical Software June 2003
No ratings yet
Matcont: ACM Transactions On Mathematical Software June 2003
25 pages
Exception Handling Introduction To Javafx 9-11-2013
No ratings yet
Exception Handling Introduction To Javafx 9-11-2013
25 pages
Dominant Resource Fairness Fair Allocation of Mult
No ratings yet
Dominant Resource Fairness Fair Allocation of Mult
15 pages
Minilessons Phraseology
No ratings yet
Minilessons Phraseology
10 pages
Network Programming
No ratings yet
Network Programming
31 pages
Week10 String
No ratings yet
Week10 String
42 pages
NP-Hard and NP-Complete
No ratings yet
NP-Hard and NP-Complete
13 pages
Compiler Design Code Optimization
No ratings yet
Compiler Design Code Optimization
5 pages
Phyml Maximum Likelihood Trees
No ratings yet
Phyml Maximum Likelihood Trees
37 pages
Components, Frameworks, Patterns Ralph E. Johnson
No ratings yet
Components, Frameworks, Patterns Ralph E. Johnson
23 pages
40 IJMRASolving Ordinary Differential Equationswith Boundary Conditions
No ratings yet
40 IJMRASolving Ordinary Differential Equationswith Boundary Conditions
8 pages
A Machine Learning Library in C++
No ratings yet
A Machine Learning Library in C++
4 pages
Wolfe, Cave-Guided Search. An Alternative To The Feature Integration Model of Visual Search
No ratings yet
Wolfe, Cave-Guided Search. An Alternative To The Feature Integration Model of Visual Search
15 pages
Federalist 10 Brutus 1 Analytical Reading
No ratings yet
Federalist 10 Brutus 1 Analytical Reading
4 pages
Arithmetic Means
No ratings yet
Arithmetic Means
15 pages
DLP Math 7 Quarter 1 Week 1 Lesson 2
No ratings yet
DLP Math 7 Quarter 1 Week 1 Lesson 2
4 pages
Sbi Clerk&rrb Po Mains Quants (E) Day - 2 168507796315
No ratings yet
Sbi Clerk&rrb Po Mains Quants (E) Day - 2 168507796315
12 pages
Binomial Theorem - Sheet
No ratings yet
Binomial Theorem - Sheet
33 pages
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
No ratings yet
Relational Algebra: CSCD343-Introduction To Databases - A. Vaisman 1
21 pages
Boy at The Back of The Class Resource Pack
0% (1)
Boy at The Back of The Class Resource Pack
22 pages
Math Construction Project
No ratings yet
Math Construction Project
24 pages
Chapter 6 - MIT112S - Sets and Relations Notes - 2019 PDF
No ratings yet
Chapter 6 - MIT112S - Sets and Relations Notes - 2019 PDF
16 pages
Edexcel S1 Revision Sheets
No ratings yet
Edexcel S1 Revision Sheets
9 pages
Mathematics of Finite-Dimensional Control Systems. Theory and Design, by David
No ratings yet
Mathematics of Finite-Dimensional Control Systems. Theory and Design, by David
3 pages
AMO 2022 Paper and Solutions.
No ratings yet
AMO 2022 Paper and Solutions.
17 pages
Httpssmart-Kids - Co.zasitesdefaultfilesactivitytestssk Practice Tests Mathematics Grade 01 PDF
No ratings yet
Httpssmart-Kids - Co.zasitesdefaultfilesactivitytestssk Practice Tests Mathematics Grade 01 PDF
7 pages
Perimeter Worksheets 5th Grade Worksheet 3
No ratings yet
Perimeter Worksheets 5th Grade Worksheet 3
7 pages
Perimeter Worksheets 5th Grade Worksheet 1
No ratings yet
Perimeter Worksheets 5th Grade Worksheet 1
7 pages
Quizizz - 3D Shapes - 002
No ratings yet
Quizizz - 3D Shapes - 002
2 pages
Complex Analysis Week2Lecture1
No ratings yet
Complex Analysis Week2Lecture1
15 pages
CH6 - Trig Identities and Equations
No ratings yet
CH6 - Trig Identities and Equations
34 pages
Lifshitz Black Holes in Four-Dimensional Critical Gravity
No ratings yet
Lifshitz Black Holes in Four-Dimensional Critical Gravity
10 pages
Class - 6 Worksheet - Ch-11 Introduction To Algebra
No ratings yet
Class - 6 Worksheet - Ch-11 Introduction To Algebra
3 pages
Graph Homomorphisms: Open Problems: L Aszl o Lov Asz June 2008
No ratings yet
Graph Homomorphisms: Open Problems: L Aszl o Lov Asz June 2008
10 pages
Grade 5 Yes No Comma A
No ratings yet
Grade 5 Yes No Comma A
2 pages
Statistika Ekonomi Bisnis Anderson 11th Edition, Chapter 1-6
No ratings yet
Statistika Ekonomi Bisnis Anderson 11th Edition, Chapter 1-6
56 pages
Math 11
No ratings yet
Math 11
1 page
Maths Assignment Unit 5
No ratings yet
Maths Assignment Unit 5
5 pages
Problems
No ratings yet
Problems
7 pages
MDAT - Math 9 Q2
No ratings yet
MDAT - Math 9 Q2
7 pages
MH1301 Extra Questions 1
No ratings yet
MH1301 Extra Questions 1
3 pages
Quiz Csi
No ratings yet
Quiz Csi
5 pages
The Complex Plane
No ratings yet
The Complex Plane
2 pages

Handbook of Floating-Point Arithmetic

Uploaded by

Handbook of Floating-Point Arithmetic

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Handbook of Floating-Point Arithmetic

Book · January 2010

Jean-Michel Muller Florent De Dinechin

SEE PROFILE SEE PROFILE

Claude-Pierre Jeannerod Vincent Lefèvre

SEE PROFILE SEE PROFILE

GNU MPFR View project

The user has requested enhancement of the downloaded file.

I Introduction, Basic Definitions, and Standards 1

2 Definitions and Basic Notions 13

2.7.1 Conditions on the formats 40

3 Floating-Point Formats and Environment 55

3.5.5 SIMD instructions 106

II Cleverly Using Floating Point Arithmetic

4 Basic Properties and Algorithms 119

5 The Fused Multiply-Add Instruction 151

5.3.2 Using the Newton-Raphson iteration for correctly

6 Enhanced Floating-Point Sums, Dot Products, and Polynomial

7 Languages and Compilers 205

7.4.1 Philosophy 223

III Implementing Floating Point Operators

8.7.2 Computing the significand square root 266

9 Hardware Implementation of Floating-Point Arithmetic 269

9.8 Probing Further 320

10 Software Implementation of Floating-Point Arithmetic 321

IV Elementary Functions 373

11.4.3 "Truncated" approximations 392

VI Perspectives and Appendix 517

15 Conclusion and Perspectives 519

16 Appendix: Number Theory Tools for Floating-Point Arithmetic 521

View publication stats

You might also like