0% found this document useful (0 votes)

5 views5 pages

Research and Analysis of Floating-Point Adder Prin

Uploaded by

muahebttgc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Research and Analysis of Floating-Point Adder Prin

Uploaded by

muahebttgc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Proceedings of the 2023 International Conference on Software Engineering and Machine Learning

DOI: 10.54254/2755-2721/8/20230092

Research and analysis of floating-point adder principle

Fengyuan Yang
School of Materials Science and Engineering, Northeastern University, Shenyang,
Liaoning Province, China, 110819

[email protected]

Abstract. With the development of the times, computers are used more and more widely, and
the research and development of adder, as the most basic operation unit, determine the
development of the computer field. This paper analyzes the principle of one-bit adder and
floating-point adder by literature analysis. One-bit adder is the most basic type of traditional
adder, besides bit-by-bit adder, overrun adder and so on. The purpose of this paper is to
understand the basic principle of adder, among them, IEEE-754 binary floating point operation
is very important. So that the traditional fixed-point adder is the basis of the floating-point
adder, which can have a new direction in the future development of floating-point adder
optimization. This paper finds that the floating-point adder is one of the most widely used
components in signal processing systems today, and therefore, the improvement of the
floating-point adder is necessary.

Keywords: One-bit adder, floating-point adder, IEEE-754.

1. Introduction
Nowadays, human society has entered the information age, and various information computing and
storage technologies are the basis for the development of the information age. Computers,
microelectronics and communication technologies related to information technology are already the
core technologies that drive social progress.
In the microelectronic processing system, the basic quadratic operations (plus, subtract, multiply,
divide) can all be reduced to addition operations, so the adder is a very important arithmetic unit in
computer logic computing. In addition to this, the adder can also perform program counting and
calculate the effective address[1]. During the operation, the data is operated and stored in the form of 0
and 1. The data types are divided into fixed-point and floating-point. A floating-point representation is
widely used. According to Stuart F.Oberman, in floating-point operations, more than 55% of the basic
operations are floating-point addition operations, so the floating-point adder is one of the most
significant components of the microprocessor[2]. Fixed-point adders are the most basic and commonly
used part of various digital systems and are also fully used in floating-point operations. The tail
module in floating-point addition is essentially an addition operation of fixed-point numbers.
Therefore, understanding the fixed-point adder and designing a high-speed fixed-point adder is
essential to improve the performance of floating-point adders.

© 2023 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).

113
Proceedings of the 2023 International Conference on Software Engineering and Machine Learning
DOI: 10.54254/2755-2721/8/20230092

This paper introduces the most basic one-bit adder in fixed-point adder by literature method and
summary and induction method to understand its internal principle and have a preliminary
understanding of the principle of adder. In the third part, the floating-point adder is introduced,
focusing on IEEE-754 binary floating-point operation standard. The research in this thesis can help
beginners to understand floating point adders and help in the subsequent research of high performance
floating point adders.

2. One-position adder
One-position adder is the most basic type of adder, and other higher performance adders are studied
based on this adder, which includes half adder and full adder.

2.1. Half adder

The input of the half adder does not take into account the feed from the lower bit and the output has
the feed from the higher bit.S0 is the sum of the inputs and R1 is the incoming output. Its logical
expressions are S0=A0⊕B0, R1= A0 ×B0.
Table 1. Half adder truth table[3].
A0 B0 S0 R1
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

2.2. Full adder

The full adder is summed from the input from the lower bit Ri and the two inputs Ai and Bi to get a
higher bit Ri+1and the current sum Si. Its logical expressions are Si = Ri⊕Ai⊕Bi, Ri+1= Ri × Ai+ Ri
×Bi+ Ai×Bi.
Table 2. Full adder truth table[4].

Ri Ai Bi Ri+1 Si
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

3. Floating point adder

3.1. The representation of floating point numbers

There are two methods of representing data in the logical operations of computers: fixed-point
numbers and floating-point numbers. The reason why floating point numbers are widely used is that
floating point numbers use the change of exponents within a certain range to change the position of the
decimal point as needed, thus representing larger real number range than the range represented by
fixed point numbers[5].

114
Proceedings of the 2023 International Conference on Software Engineering and Machine Learning
DOI: 10.54254/2755-2721/8/20230092

The true value of a floating-point number consists of the ordinal base R (with the implied
convention of 2), the exponent-marker E and the mantissa M. The number sign indicates the positive
or negative of the number. The exponent-marker E is a fixed-point integer, represented by a
complement or shift code, whose number of bits determines the range of the value. The mantissa M is
a fixed-point decimal number, represented by the original or complementary code, and its digits
determine the precision of the number.
The IEEE-754 binary floating-point standard was developed by the Institute of Electric and
Electronics Engineers in 1985 and has been the industry standard for floating-point operations ever
since[6].

3.2. IEEE-754 binary floating point operation

3.2.1. Four formats. The IEEE-754 standard has four formats: Single precision floating-point
numbers, Double precision floating-point numbers, extended double-precision floating-point (SPARC),
and extended double-precision floating-point (x86).
Table 3. Bit distribution of four types of float[5]
Format Total number Symbol bit Exponent-marker Mantissa
of bits S E M
Single-precision 32 bit 1 bit 8 bit 23 bit
floating-point
Double-precision 64 bit 1 bit 11 bit 52 bit
floating-point
Extended
double-precision 128 bit 1 bit 15 bit 112 bit
floating-point
(SPARC)
Extended 63 bit+1 bit
double-precision 80 bit 1 bit 15 bit (Explicit
floating-point leading
(x86) significant
digits)

3.2.2. Single-precision floating-point. Single-precision floating-point numbers have three intervals:

the 23-bit mantissa M, the 8-bit exponent-marker E, and the 1-bit sign S. Bits 0 to 22 in the 32-bit field
are the mantissa M, where bit 0 is the least significant bit of the mantissa, and the first bit to the left of
the mantissa decimal point must be 1. In floating-point addition calculations, the calculation is
generally achieved by a specific frame shift. Since the first digit of the specification is 1, the 1 before
the decimal point can be ignored when saving the trailing digit, and an extra binary bit is added to the
mantissa part to improve accuracy. Bits 23 to 30 are the exponent-marker E, where bit 23 is the least
significant bit of the exponent-marker. 8 bits of the exponent-marker can represent exponential values
between 0 and 255. The exponent can be either positive or negative. When the exponent is negative, a
deviation value (Bias=127) is introduced and the sum of it and the exponent value is used as the value
stored in the exponent field. The 31st bit is the sign bit, where negative numbers are represented by 0
and positive numbers are represented by 1.

3.2.3. Double-precision floating-point. A double precision floating point number has 64 bits,
including 53 bits for the mantissa M, 11 bits for the exponent-marker E, and one bit for the sign S. Bits
0 through 51 are the mantissa M, and bit 0 is the least significant bit of the mantissa. Bits 52 to 62 are
the 11-bit exponent-marker E, and bit 52 is the least significant bit of the exponent-marker. 11 bits of

115
Proceedings of the 2023 International Conference on Software Engineering and Machine Learning
DOI: 10.54254/2755-2721/8/20230092

the exponent can represent exponent values between 0 and 2047. When the exponent is negative, a
deviation value (Bias=1023) is introduced, and the sum of the deviation value and the exponent value
is used as the value stored in the exponent field. The 63rd bit is the sign bit, where positive numbers
are expressed as 0 and negative numbers are expressed as 1.

3.2.4. Extended double precision floating point (SPARC). The SPARC floating-point format is a
quadruple precision format that occupies four 32-bit fields: 112 mantissa bits, 15 exponent-marker bits,
and 1 symbol bit. Bits 0 through 11 are the mantissa M, bits 112 through 126 are the exponent-marker
E, and bit 127 is the sign bit S.

3.2.5. Extended double precision floating point(x86). The extended double precision floating point
format (x86) numbers three consecutive 32 bits by 96 bits. Bits 0 through 63 store the 64-bit mantissa,
the 15-bit exponent-marker is stored in bits 64 to 78, and the sign bit is stored in bit 79. However, the
format actually uses only 80 bits, i.e., the higher 16 bits of the highest 32 bits of the address are not
used by the structure.

3.3. Traditional floating point addition algorithm

The basic algorithm of floating-point addition is to perform the operation of adding the trailing
numbers while ensuring that the exponent markers involved in the operation are the same size.
Suppose there are two floating point numbers A and B in double precision floating point format.
The operation flow is as follows.
a. Exponential subtraction: Compare the exponents of A and B and subtract them to get the
absolute value of the difference d. The absolute value d is the number of digits shifted by the smaller
mantissa in the following exponential alignment.
b. Exponential Alignment: Shift the mantissa of the smaller number to the right by d places so that
the exponents of the two numbers are aligned, i.e., the two numbers have the same exponent[7].
c. Add the valid digits of the mantissa: Add the valid mantissa of the 2 numbers according to their
signs, provided that A and B have the same exponent.
d. Conversion: The result obtained from the calculation is expressed in the form of a
complementary code. When the effective digit is negative, it is converted to the form of negative sign -
mantissa, and the trailing digit is expressed in the true code.
e. Mantissa specification: Shift the mantissa left or right according to the position of the first 1 in
the higher digit, and adjust the ordinate so that the mantissa format becomes 1.f[7].
f. Rounding: Round the final result, if the ingestion causes non-specification, the valid bit needs to
be shifted 1 bit to the right, and the ordinal code of the larger number is added 1.

4. Conclusion
This paper mainly discusses the principle of one-bit adder and IEEE-754 binary floating-point
arithmetic standard and analyzes the traditional algorithm of floating-point addition, especially the
four formats in floating-point arithmetic and their respective specific formats and differences between
each other. However, this thesis only introduces the basics of these arithmetic standards through a
brief summary of other literature and books. Still, due to time constraints, the actual research design of
the floating-point adder has not been mentioned yet. In future research, people can specifically explore
improving the existing floating-point adder and research a high-performance, low-power
floating-point adder.

References
[1] WANG Dong,LI Zhentao,MAO Erkun,LI Baofeng. CMOS VLSI Design (3rd Edition), Beijing:
China Electric Power Press, 2008
[2] Stuart F.Oberman.Design issues in high performance floating point arithmrtic Units
[D],Standford University, Degree of Doctor of Philosophy, 1996.

116
Proceedings of the 2023 International Conference on Software Engineering and Machine Learning
DOI: 10.54254/2755-2721/8/20230092

[3] JI Chao, LI Tuo, ZOU Xiaofeng & ZHANG Lu. (2022). Design of combinatorial logic circuit
based on memristor. Semiconductor Technology(08),649-659.
doi:10.13290/j.cnki.bdtjs.2022.08.010.
[4] Dai Guangzhen, Zhao Zhenyu, Song Xingwen, Han Mingjun & Ni Tianming. (2023).
Memristor hybrid logic circuit design and its application. Science in China: Information
Science (01), 178-190. doi:.
[5] WANG Dayu. (2012). Research and Design of High-Performance Floating-Point Adders
(Master's Thesis, Nanjing University of Aeronautics and Astronautics
https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD201301&filename=101204159
8.nh).
[6] IEEE Std 754-1985:IEEE Standard for Binary Floating point Arithmatic,IEEE,1985.
[7] FENG Wei. (2009). Optimization Design of a Fast Floating-Point Adder (Master's Thesis,
University of Science and Technology of
China).https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD2010&filename=20100
18994.nh

117

Floating Break Water
No ratings yet
Floating Break Water
160 pages
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
No ratings yet
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
8 pages
EC-502 - Aritra Dutta
No ratings yet
EC-502 - Aritra Dutta
6 pages
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
No ratings yet
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
11 pages
Design and Implementation of Fast Floating Point Multiplier Unit
No ratings yet
Design and Implementation of Fast Floating Point Multiplier Unit
5 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Synthesis of Area Optimized 64 Bit Double Precision Floating Point Multiplier Using VHDL
No ratings yet
Synthesis of Area Optimized 64 Bit Double Precision Floating Point Multiplier Using VHDL
4 pages
Design of 32 Bit Floating Point Addition and Subtraction Units Based On Ieee 754 Standard IJERTV2IS60996
No ratings yet
Design of 32 Bit Floating Point Addition and Subtraction Units Based On Ieee 754 Standard IJERTV2IS60996
5 pages
Chap-03 Computer Arithmetics
No ratings yet
Chap-03 Computer Arithmetics
16 pages
2174 PDF
No ratings yet
2174 PDF
7 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
6 pages
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
No ratings yet
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
4 pages
Ijspr 1203 438
No ratings yet
Ijspr 1203 438
4 pages
B1 Group3
No ratings yet
B1 Group3
13 pages
Manage-Implementation of Floating - Bhagyashree Hardiya
No ratings yet
Manage-Implementation of Floating - Bhagyashree Hardiya
6 pages
Computer Architecture & Organization Unit 2
No ratings yet
Computer Architecture & Organization Unit 2
24 pages
CA Notes 01
No ratings yet
CA Notes 01
14 pages
Implementation of Floating Point Multiplier
No ratings yet
Implementation of Floating Point Multiplier
4 pages
Floating-Point Multiplication Unit With 16-Bit Significant and 8-Bit Exponent
No ratings yet
Floating-Point Multiplication Unit With 16-Bit Significant and 8-Bit Exponent
6 pages
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
No ratings yet
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
12 pages
Lab 1
100% (1)
Lab 1
10 pages
Design of Double Ieee Precision
No ratings yet
Design of Double Ieee Precision
9 pages
Electronics: Asynchronous Floating-Point Adders and Communication Protocols: A Survey
No ratings yet
Electronics: Asynchronous Floating-Point Adders and Communication Protocols: A Survey
23 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
Floating Points
No ratings yet
Floating Points
31 pages
COA Module 2
No ratings yet
COA Module 2
65 pages
Floating Point: Adders and Multipliers
No ratings yet
Floating Point: Adders and Multipliers
45 pages
Verilog Project Report
No ratings yet
Verilog Project Report
13 pages
The IEEE Standard For Floating Point Arithmetic
No ratings yet
The IEEE Standard For Floating Point Arithmetic
9 pages
Floating Point Representation of Numbers: Wide Range
No ratings yet
Floating Point Representation of Numbers: Wide Range
11 pages
Energy Efficient High Speed Floating Point Arithmetic Unit: Somya Kumawat, Arpan Shah, Ramesh Bharti
No ratings yet
Energy Efficient High Speed Floating Point Arithmetic Unit: Somya Kumawat, Arpan Shah, Ramesh Bharti
3 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Design and Implementation of A High Performance Floating
No ratings yet
Design and Implementation of A High Performance Floating
15 pages
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
No ratings yet
Implementation of IEEE 754 Compliant Single Precision Floating-Point Adder Unit Supporting Denormal Inputs On Xilinx FPGA
5 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
COA
No ratings yet
COA
14 pages
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
No ratings yet
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
8 pages
Floating Point Alu
No ratings yet
Floating Point Alu
11 pages
Module 2
No ratings yet
Module 2
33 pages
21CS403Notes 4
No ratings yet
21CS403Notes 4
8 pages
Shi Wal 95 A
No ratings yet
Shi Wal 95 A
8 pages
BCS302 Unit-2 (Part-III)
No ratings yet
BCS302 Unit-2 (Part-III)
7 pages
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
No ratings yet
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
20 pages
Computer Organisation
No ratings yet
Computer Organisation
4 pages
Division: Check For 0 Divisor Long Division Approach
No ratings yet
Division: Check For 0 Divisor Long Division Approach
27 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
L-5 Floating Point Representation of Numbers
No ratings yet
L-5 Floating Point Representation of Numbers
12 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Floating Point Arith
100% (1)
Floating Point Arith
8 pages
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
No ratings yet
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
18 pages
Design of Low-Area and High Speed Pipelined
No ratings yet
Design of Low-Area and High Speed Pipelined
6 pages
Module 2
No ratings yet
Module 2
19 pages
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
No ratings yet
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
6 pages
Digital Electronics for Beginners: 1, #1
From Everand
Digital Electronics for Beginners: 1, #1
Raja Suresh
No ratings yet
Right Way to Build Half and Full Adders with Logic Gates
From Everand
Right Way to Build Half and Full Adders with Logic Gates
GURUPRASAD N H
No ratings yet
Build and Study RS, D, JK, and T Flip Flops Using TTL Logic Gates
From Everand
Build and Study RS, D, JK, and T Flip Flops Using TTL Logic Gates
GURUPRASAD N H
No ratings yet
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Trackpad Pro Ver. 5.0 Class 7
From Everand
Trackpad Pro Ver. 5.0 Class 7
Nidhi Arora
5/5 (1)
Bubble Sort
No ratings yet
Bubble Sort
1 page
Structure Graph
No ratings yet
Structure Graph
4 pages
Week 1 CourseIntroduction
No ratings yet
Week 1 CourseIntroduction
8 pages
Verilog - Chapter5 - Structural Model - Combinational Logic - Sequential Logic
No ratings yet
Verilog - Chapter5 - Structural Model - Combinational Logic - Sequential Logic
31 pages
The New Embedded System Design Methodology For Improving Design Process Performance
No ratings yet
The New Embedded System Design Methodology For Improving Design Process Performance
10 pages
Electronics 13 02971 v2
No ratings yet
Electronics 13 02971 v2
11 pages
Ug Nios2 Ide Help
No ratings yet
Ug Nios2 Ide Help
157 pages
Extended Abstract
No ratings yet
Extended Abstract
10 pages
IEEE Xplore Reference Download 2025.4.20.8.40.21
No ratings yet
IEEE Xplore Reference Download 2025.4.20.8.40.21
1 page
Design & Simulation of 32-Bit Floating Point Alu
No ratings yet
Design & Simulation of 32-Bit Floating Point Alu
3 pages
Bieu Dien Do Thi Bang Danh Sach Ke Dung Vector
No ratings yet
Bieu Dien Do Thi Bang Danh Sach Ke Dung Vector
2 pages
Manuscript
No ratings yet
Manuscript
12 pages
Electronics - Number System & Logic Gates
No ratings yet
Electronics - Number System & Logic Gates
26 pages
Autodesk Nastran 2022 Nonlinear Analysis Handbook
No ratings yet
Autodesk Nastran 2022 Nonlinear Analysis Handbook
2 pages
Mathematics Grade 4
No ratings yet
Mathematics Grade 4
4 pages
3.OO Testing
No ratings yet
3.OO Testing
9 pages
Qdoc - Tips Inverted Pendulum
No ratings yet
Qdoc - Tips Inverted Pendulum
10 pages
Filter Sizing - Pool & Spa News
No ratings yet
Filter Sizing - Pool & Spa News
3 pages
Phase Plane Analysis - 3
No ratings yet
Phase Plane Analysis - 3
21 pages
Ship Hydrodynamics 1 Part B Lecture 7 - Seakeeping Criteria - Supplement
100% (1)
Ship Hydrodynamics 1 Part B Lecture 7 - Seakeeping Criteria - Supplement
23 pages
Application - Data Interpretation
No ratings yet
Application - Data Interpretation
10 pages
DSP - Mod2 QB
No ratings yet
DSP - Mod2 QB
15 pages
HEIDENHAINAccuracy of Feed Axes 349 843-20
No ratings yet
HEIDENHAINAccuracy of Feed Axes 349 843-20
12 pages
Edexcel GCSE Maths Higher Paper 32
No ratings yet
Edexcel GCSE Maths Higher Paper 32
24 pages
MCQ
100% (1)
MCQ
5 pages
Lecture 6.1
No ratings yet
Lecture 6.1
128 pages
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
No ratings yet
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
3 pages
UKMT Senior Maths Challenge 2014
0% (1)
UKMT Senior Maths Challenge 2014
8 pages
MF821 Syllabus
No ratings yet
MF821 Syllabus
5 pages
Test-1 Land Surveyor
No ratings yet
Test-1 Land Surveyor
23 pages
Satchwell: Universal Multi-Loop Intelligent Advanced Controller
No ratings yet
Satchwell: Universal Multi-Loop Intelligent Advanced Controller
24 pages
Crossmark: Ocean Engineering
No ratings yet
Crossmark: Ocean Engineering
13 pages
Aryabhatta Question Paper Class XI 2019
No ratings yet
Aryabhatta Question Paper Class XI 2019
15 pages
Caed MCQ With Answers
No ratings yet
Caed MCQ With Answers
46 pages
Testbank For Precalculus 11th Edition Larson
No ratings yet
Testbank For Precalculus 11th Edition Larson
17 pages
4th Grade Math Framework
No ratings yet
4th Grade Math Framework
5 pages
Brace Forces in Steel Box Girders With Single Diagonal Lateral Bracing Systems
No ratings yet
Brace Forces in Steel Box Girders With Single Diagonal Lateral Bracing Systems
12 pages
Anyons in An Exactly Solved Model and Beyond: Alexei Kitaev
No ratings yet
Anyons in An Exactly Solved Model and Beyond: Alexei Kitaev
113 pages
2nd Summative Test TOS
No ratings yet
2nd Summative Test TOS
1 page
Roark's Formulas For Excel - Superposition Wizard: Universal Technical Systems Inc
No ratings yet
Roark's Formulas For Excel - Superposition Wizard: Universal Technical Systems Inc
6 pages
SimAM - A Simple, Parameter-Free Attention Module For Convolutional Neural Networks
No ratings yet
SimAM - A Simple, Parameter-Free Attention Module For Convolutional Neural Networks
12 pages

Research and Analysis of Floating-Point Adder Prin

Uploaded by

Research and Analysis of Floating-Point Adder Prin

Uploaded by

Proceedings of the 2023 International Conference on Software Engineering and Machine Learning

Research and analysis of floating-point adder principle

Keywords: One-bit adder, floating-point adder, IEEE-754.

2.1. Half adder

2.2. Full adder

3. Floating point adder

3.1. The representation of floating point numbers

3.2. IEEE-754 binary floating point operation

3.2.2. Single-precision floating-point. Single-precision floating-point numbers have three intervals:

3.3. Traditional floating point addition algorithm

You might also like