0% found this document useful (0 votes)
7 views14 pages

Stackoverflow Com ...

The document discusses the differences between float and double data types in programming, highlighting that double has double the precision of float, with 15 decimal digits compared to 7. It explains the implications of using each type, such as potential truncation errors and the risk of reaching 'infinity' with float more easily than with double. The document also advises on best practices for using floating-point numbers, including the use of int for counting and considering the Kahan summation algorithm to minimize errors.

Uploaded by

juliettehailand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Stackoverflow Com ...

The document discusses the differences between float and double data types in programming, highlighting that double has double the precision of float, with 15 decimal digits compared to 7. It explains the implications of using each type, such as potential truncation errors and the risk of reaching 'infinity' with float more easily than with double. The document also advises on best practices for using floating-point numbers, including the use of int for counting and considering the Kahan summation algorithm to minimize errors.

Uploaded by

juliettehailand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Products Search… Log in Sign up

What is the difference between Ask Question

float and double?


Home
Asked 14 years, 4 months ago Modified 6 months ago Viewed 1.2m times
Questions

Tags I've read about the difference between double precision and single
precision. However, in most cases, float and double seem to be
539 interchangeable, i.e. using one or the other does not seem to affect
Users
the results. Is this really the case? When are floats and doubles
Companies interchangeable? What are the differences between them?

LABS c++ c floating-point precision ieee-754

Jobs

Discussions Share Follow edited Dec 31, 2021 at 9:51


TheMaster
48.3k 7 66 92
COLLECTIVES

Communities for your favorite asked Mar 5, 2010 at 12:48


technologies. Explore all VaioIsBorn
Collectives
7,853 9 32 29

TEAMS
Add a comment

Ask questions, find answers


and collaborate at work with
Stack Overflow for Teams.

Explore Teams

Create a free Team

Report this ad
By clicking “Accept all cookies”, you agree Stack Exchange
can store cookies on your device and disclose information in
14 Answers
accordance with our Cookie Policy. Sorted by: Highest score (default)

Accept all cookies Huge difference.


Necessary cookies only

As the name implies, a double has 2x the precision of float [1]. In


Customize settings
641
general a double has 15 decimal digits of precision, while float
has 7.
Here's how the number of digits are calculated:

double has 52 mantissa bits + 1 hidden bit:


log(253)÷log(10) = 15.95 digits

float has 23 mantissa bits + 1 hidden bit:


log(224)÷log(10) = 7.22 digits

This precision loss could lead to greater truncation errors being


accumulated when repeated calculations are done, e.g.

float a = 1.f / 81;


float b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.7g\n", b); // prints 9.000023

while

double a = 1.0 / 81;


double b = 0;
for (int i = 0; i < 729; ++ i)
b += a;
printf("%.15g\n", b); // prints 8.99999999999996

Also, the maximum value of float is about 3e38 , but double is


about 1.7e308 , so using float can hit "infinity" (i.e. a special
floating-point number) much more easily than double for
something simple, e.g. computing the factorial of 60.

During testing, maybe a few test cases contain these huge numbers,
which may cause your programs to fail if you use floats.

Of course, sometimes, even double isn't accurate enough, hence


we sometimes have long double [1] (the above example gives
9.000000000000000066 on Mac), but all floating point types suffer
from round-off errors, so if precision is very important (e.g. money
processing) you should use int or a fraction class.

Furthermore, don't use += to sum lots of floating point numbers,


as the errors accumulate quickly. If you're using Python, use fsum .
Otherwise, try to implement the Kahan summation algorithm.

[1]: The C and C++ standards do not specify the representation of float , double
and long double . It is possible that all three are implemented as IEEE double-
precision. Nevertheless, for most architectures (gcc, MSVC; x86, x64, ARM) float is
indeed a IEEE single-precision floating point number (binary32), and double is a
IEEE double-precision floating point number (binary64).

Share Follow edited Jun 20, 2020 at 9:12


Community Bot
1 1

answered Mar 5, 2010 at 13:06


kennytm
519k 108 1.1k 1k

20 The usual advice for summation is to sort your floating point numbers by
magnitude (smallest first) before summing.
– R.. GitHub STOP HELPING ICE Aug 6, 2010 at 9:49

2 Note that while C/C++ float and double are nearly always IEEE single
and double precision respectively C/C++ long double is far more
variable depending on your CPU, compiler and OS. Sometimes it's the
same as double, sometimes it's some system-specific extended format,
Sometimes it's IEEE quad precision. – plugwash Feb 8, 2019 at 5:27

@R..GitHubSTOPHELPINGICE: why? Could you explain?


– Sreeraj Chundayil Jan 2, 2020 at 7:27

2 @InQusitive: Consider for example an array consisting of the value 2^24


followed by 2^24 repetitions of the value 1. Summing in order produces
2^24. Reversing produces 2^25. Of course you can make examples (e.g.
make it 2^25 repetitions of 1) where any order ends up being
catastrophically wrong with a single accumulator but smallest-
magnitude-first is the best among such. To do better you need some
kind of tree. – R.. GitHub STOP HELPING ICE Jan 2, 2020 at 15:18

3 @R..GitHubSTOPHELPINGICE: summing is even more tricky if the array


contains both positive and negative numbers. – chqrlie Sep 7, 2020 at
8:59

Show 1 more comment

Here is what the standard C99 (ISO-IEC 9899 6.2.5 §10) or C++2003
(ISO-IEC 14882-2003 3.1.9 §8) standards say:
64
There are three floating point types: float , double , and
long double . The type double provides at least as much
precision as float , and the type long double provides at
least as much precision as double . The set of values of the
type float is a subset of the set of values of the type
double ; the set of values of the type double is a subset of
the set of values of the type long double .
The C++ standard adds:

The value representation of floating-point types is


implementation-defined.

I would suggest having a look at the excellent What Every


Computer Scientist Should Know About Floating-Point Arithmetic
that covers the IEEE floating-point standard in depth. You'll learn
about the representation details and you'll realize there is a tradeoff
between magnitude and precision. The precision of the floating
point representation increases as the magnitude decreases, hence
floating point numbers between -1 and 1 are those with the most
precision.

Share Follow edited May 24, 2021 at 21:13

answered Mar 5, 2010 at 12:54


Gregory Pakosz
69.7k 20 141 165

Add a comment

Given a quadratic equation: x2 − 4.0000000 x + 3.9999999 = 0, the


exact roots to 10 significant digits are, r1 = 2.000316228 and
32 r2 = 1.999683772.

Using float and double , we can write a test program:

#include <stdio.h>
#include <math.h>

void dbl_solve(double a, double b, double c)


{
double d = b*b - 4.0*a*c;
double sd = sqrt(d);
double r1 = (-b + sd) / (2.0*a);
double r2 = (-b - sd) / (2.0*a);
printf("%.5f\t%.5f\n", r1, r2);
}

void flt_solve(float a, float b, float c)


{
float d = b*b - 4.0f*a*c;
float sd = sqrtf(d);
float r1 = (-b + sd) / (2.0f*a);
float r2 = (-b - sd) / (2.0f*a);
printf("%.5f\t%.5f\n", r1, r2);
}
int main(void)
{
float fa = 1.0f;
float fb = -4.0000000f;
float fc = 3.9999999f;
double da = 1.0;
double db = -4.0000000;
double dc = 3.9999999;
flt_solve(fa, fb, fc);
dbl_solve(da, db, dc);
return 0;
}

Running the program gives me:

2.00000 2.00000
2.00032 1.99968

Note that the numbers aren't large, but still you get cancellation
effects using float .

(In fact, the above is not the best way of solving quadratic
equations using either single- or double-precision floating-point
numbers, but the answer remains unchanged even if one uses a
more stable method.)

Share Follow edited Aug 2, 2023 at 6:57


remcycles
1,473 15 18

answered Mar 5, 2010 at 17:57


Alok Singhal
95.2k 21 127 158

Add a comment

A double is 64 and single precision (float) is 32 bits.

The double has a bigger mantissa (the integer bits of the real
19 number).
Any inaccuracies will be smaller in the double.

Share Follow answered Mar 5, 2010 at 12:53


graham.reeds
16.4k 17 74 137

Add a comment

I just ran into a error that took me forever to figure out and
potentially can give you a good example of float precision.
15
#include <iostream>
#include <iomanip>

int main(){
for(float t=0;t<1;t+=0.01){
std::cout << std::fixed << std::setprecision(6) << t << s
}
}

The output is

0.000000
0.010000
0.020000
0.030000
0.040000
0.050000
0.060000
0.070000
0.080000
0.090000
0.100000
0.110000
0.120000
0.130000
0.140000
0.150000
0.160000
0.170000
0.180000
0.190000
0.200000
0.210000
0.220000
0.230000
0.240000
0.250000
0.260000
0.270000
0.280000
0.290000
0.300000
0.310000
0.320000
0.330000
0 340000

As you can see after 0.83, the precision runs down significantly.

However, if I set up t as double, such an issue won't happen.

It took me five hours to realize this minor error, which ruined my


program.
Share Follow edited Mar 10, 2018 at 11:06
nbro
15.9k 34 116 208

answered Oct 20, 2015 at 6:51


Elliscope Fang
351 2 4 8

5 just to be sure: the solution of your issue should be to use an int


preferably ? If you want to iterate 100 times, you should count with an
int rather than using a double – BlueTrin Sep 19, 2016 at 12:07

10 Using double is not a good solution here. You use int to count and
do an internal multiplication to get your floating-point value. – Richard
Sep 24, 2017 at 23:10

Add a comment

There are three floating point types:

float
14
double

long double

A simple Venn diagram will explain about: The set of values of the
types

Share Follow answered Sep 7, 2020 at 8:48


Anushil Kumar
732 8 10
Add a comment

The size of the numbers involved in the float-point calculations is


not the most relevant thing. It's the calculation that is being
12 performed that is relevant.

In essence, if you're performing a calculation and the result is an


irrational number or recurring decimal, then there will be rounding
errors when that number is squashed into the finite size data
structure you're using. Since double is twice the size of float then
the rounding error will be a lot smaller.

The tests may specifically use numbers which would cause this kind
of error and therefore tested that you'd used the appropriate type
in your code.

Share Follow edited Mar 10, 2018 at 11:05


nbro
15.9k 34 116 208

answered Mar 5, 2010 at 13:05


Dolbz
2,106 1 16 25

Add a comment

Type float, 32 bits long, has a precision of 7 digits. While it may


store values with very large or very small range (+/- 3.4 * 10^38 or *
10 10^-38), it has only 7 significant digits.

Type double, 64 bits long, has a bigger range (*10^+/-308) and 15


digits precision.

Type long double is nominally 80 bits, though a given compiler/OS


pairing may store it as 12-16 bytes for alignment purposes. The
long double has an exponent that just ridiculously huge and should
have 19 digits precision. Microsoft, in their infinite wisdom, limits
long double to 8 bytes, the same as plain double.

Generally speaking, just use type double when you need a floating
point value/variable. Literal floating point values used in
expressions will be treated as doubles by default, and most of the
math functions that return floating point values return doubles.
You'll save yourself many headaches and typecastings if you just use
double.

Share Follow edited Nov 17, 2017 at 23:29


Peter Mortensen
31.3k 22 109 132

answered Mar 8, 2011 at 5:13


Zain Ali
15.8k 14 97 108

Actually, for float it is between 7 and 8, 7.225 to be exact.


– Peter Mortensen Apr 12, 2013 at 20:25

Add a comment

Floats have less precision than doubles. Although you already know,
read What WE Should Know About Floating-Point Arithmetic for
10 better understanding.

Share Follow edited Dec 15, 2023 at 17:42


Charles Burns
10.4k 7 64 83

answered Mar 5, 2010 at 12:54


N 1.1
12.4k 6 44 62

For instance, all AVR doubles are floats (four-byte). – Peter Mortensen
Apr 12, 2013 at 20:22

Add a comment

When using floating point numbers you cannot trust that your local
tests will be exactly the same as the tests that are done on the
3 server side. The environment and the compiler are probably
different on you local system and where the final tests are run. I
have seen this problem many times before in some TopCoder
competitions especially if you try to compare two floating point
numbers.

Share Follow answered Mar 5, 2010 at 13:00


Tuomas Pelkonen
7,821 2 32 32

Add a comment

The built-in comparison operations differ as in when you compare 2


numbers with floating point, the difference in data type (i.e. float or
3 double) may result in different outcomes.

Share Follow edited Nov 5, 2012 at 1:35


mbinette
5,084 3 25 32

answered Dec 7, 2011 at 7:40


Johnathan Lau
39 2

Add a comment

Quantitatively, as other answers have pointed out, the difference is


that type double has about twice the precision, and three times the
2 range, as type float (depending on how you count).

But perhaps even more important is the qualitative difference. Type


float has good precision, which will often be good enough for
whatever you're doing. Type double , on the other hand, has
excellent precision, which will almost always be good enough for
whatever you're doing.

The upshot, which is not nearly as well known as it should be, is that
you should almost always use type double . Unless you have some
particularly special need, you should almost never use type float .

As everyone knows, "roundoff error" is often a problem when you're


doing floating-point work. Roundoff error can be subtle, and
difficult to track down, and difficult to fix. Most programmers don't
have the time or expertise to track down and fix numerical errors in
floating-point algorithms — because unfortunately, the details end
up being different for every different algorithm. But type double
has enough precision such that, much of the time, you don't have
to worry. You'll get good results anyway. With type float , on the
other hand, alarming-looking issues with roundoff crop up all the
time.

And the thing that's not necessarily different between type float
and double is execution speed. On most of today's general-
purpose processors, arithmetic operations on type float and
double take more or less exactly the same amount of time.
Everything's done in parallel, so you don't pay a speed penalty for
the greater range and precision of type double . That's why it's safe
to make the recommendation that you should almost never use
type float : Using double shouldn't cost you anything in speed,
and it shouldn't cost you much in space, and it will almost definitely
pay off handsomely in freedom from precision and roundoff error
woes.

(With that said, though, one of the "special needs" where you may
need type float is when you're doing embedded work on a
microcontroller, or writing code that's optimized for a GPU. On
those processors, type double can be significantly slower, or
practically nonexistent, so in those cases programmers do typically
choose type float for speed, and maybe pay for it in precision.)

Share Follow edited Aug 11, 2022 at 0:29

answered Feb 26, 2022 at 12:34


Steve Summit
46.8k 8 76 108

Add a comment

If one works with embedded processing, eventually the underlying


hardware (e.g. FPGA or some specific processor / microcontroller
1 model) will have float implemented optimally in hardware whereas
double will use software routines. So if the precision of a float is
enough to handle the needs, the program will execute some times
faster with float then double. As noted on other answers, beware of
accumulation errors.

Share Follow answered May 7, 2020 at 13:36


Lissandro
71 5

Add a comment

Unlike an int (whole number), a float have a decimal point, and


so can a double . But the difference between the two is that a
-2 double is twice as detailed as a float , meaning that it can have
double the amount of numbers after the decimal point.

Share Follow answered Sep 5, 2017 at 12:10


Nykal
169 2 4

6 It doesn't mean that at all. It actually means twice as many integral


decimal digits, and it is more than double. The relationship between
fractional digits and precision is not linear: it depends on the value: e.g.
0.5 is precise but 0.33333333333333333333 is not. – user207421 Sep 24,
2017 at 23:34

Add a comment

Highly active question. Earn 10 reputation (not counting the association bonus)
in order to answer this question. The reputation requirement helps protect this
question from spam and non-answer activity.

Not the answer you're looking for? Browse other questions tagged
c++ c floating-point precision ieee-754 or ask your own question.

The Overflow Blog

Community Products Roadmap Update, July 2024

Featured on Meta

We spent a sprint addressing your requests — here’s how it went

Upcoming initiatives on Stack Overflow and across the Stack Exchange


network...

Policy: Generative AI (e.g., ChatGPT) is banned

The [lib] tag is being burninated

What makes a homepage useful for logged-in users

Linked

4 Comparison of float and double variables

-1 C++ different output in double and float

0 Float and Double value creating confusion in c

0 What is the reason of difference in precision between double and long double

0 C++ - Difference between float and double?

0 Why the float values are different from double values when set precision?

-8 What's different between a single precision and double precision floating values?

-1 In Java, specifically Floating-point, what is the difference between float and


double?

0 "C++ float vs. double Differences?"

3965 Is floating-point math broken?

See more linked questions

Related

105 Should I use double or float?

2 double precision C++


1 Double versus float

160 'float' vs. 'double' precision

6 What's the difference between LONG float and double in C++?

1 Confused between double and float data types

0 Precision in double and other floating numbers

0 Double vs float precision issue

2 What does the precision of float, double or long double mean in C++?

1 Floating-point and ieee-754

Hot Network Questions


Airtight beaks?

My previously healthy avocado plant was pruned bare and has since been turning brown
on top

Could two moons orbit each other around a planet?

What effects could cause a laser beam inside a pipe to bend?

Questions about mail-in ballot

Spec sheet on Shimano FC-C201 seemingly does not match my bike

How to read chainline specs for crankset and bottom bracket compatibility?

Why does `p` not put all yanked lines after quitting and reopening Vim?

How shall I find the device of a phone's storage so that I can mount it in Linux?

Space Invasion story. People get a blister, or pimple on their arm, treated, they are fine,
untreated they die

Plane to train in Copenhagen

Do thermodynamic cycles occur only in human-made machines?

Why is pressure in the outermost layer of a star lower than at its center?

Did any 8-bit machine select palette by character name instead of color memory?

Hourly pay rate calculation between Recruiting and Payroll Systems

Segments of a string, doubling in length

How to maintain dependencies shared among microservices?

Is there a generalization of factoring that can be extended to the Real numbers?

Sort Number Array

What is a trillesti?

Why does the voltage double at the end of line in a open transmission line (physical
explanation)

How much damage does my Hexblade Warlock deal with their Bonus Action attack?
Old SF story about someone who detonated an atomic bomb, sacrificing self to save
society from an evil government

Greek myth about an athlete who kills another man with a discus

Question feed

STACK OVERFLOW
Questions Help Chat

PRODUCTS
Teams Advertising Collectives Talent

COMPANY
About Press Work Here Legal Privacy Policy Terms of Service Contact Us Cookie Settings Cookie Policy

STACK EXCHANGE NETWORK


Technology Culture & recreation Life & arts Science Professional Business API Data

Blog Facebook Twitter LinkedIn Instagram

Site design / logo © 2024 Stack Exchange Inc; user contributions licensed under CC BY-SA. rev 2024.7.4.11941

You might also like