Can every float be expressed exactly as a double?

Can every possible value of a float variable can be represented exactly in a double variable?

In other words, for all possible values X will the following be successful:

float f1 = X;
double d = f1;
float f2 = (float)d;

if(f1 == f2)

My suspicion is that there is no exception, or if there is it is only for an edge case (like +/- infinity or NaN).

Edit: Original wording of question was confusing (stated two ways, one which would be answered "no" the other would be answered "yes" for the same answer). I've reworded it so that it matches the question title.

Asked by: Joyce936 | Posted: 21-01-2022

Answer 1


Proof by enumeration of all possible cases:

public class TestDoubleFloat  {
    public static void main(String[] args) {
        for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
            float f1 = Float.intBitsToFloat((int) i);
            double d = (double) f1;
            float f2 = (float) d;
            if (f1 != f2) {
                if (Float.isNaN(f1) && Float.isNaN(f2)) {
                    continue; // ok, NaN
                fail("oops: " + f1 + " != " + f2);

finishes in 12 seconds on my machine. 32 bits are small.

Answered by: Ada888 | Posted: 22-02-2022

Answer 2

In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.

Answered by: Walter461 | Posted: 22-02-2022

Answer 3

Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.

Answered by: Blake919 | Posted: 22-02-2022

Answer 4

As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)

Answered by: Kellan215 | Posted: 22-02-2022

Answer 5

If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.

That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.

(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)

Answered by: Richard461 | Posted: 22-02-2022

Answer 6

@KenG: This code:

float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"

fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:

0 01111011 10011001100110011001101

and the bit pattern for d is:

0 01111111011 100110011001100110011010 (followed by lots more zeros)

which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.

There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.

Answered by: Adelaide704 | Posted: 22-02-2022

Answer 7

Snark: NaNs will compare differently after (or indeed before) conversion.

This does not, however, invalidate the answers already given.

Answered by: Julia217 | Posted: 22-02-2022

Answer 8

I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D

I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:

**** FAILURE **** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606

I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.

Here is the simple C++ code:

typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
    float f1 = *(float *)&i;
    double d = f1;
    float f2 = (float)d;
    if(f1 != f2)
        printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
    if ((i % 1000000) == 0)
        printf("Iteration: %d\n", i);

Answered by: Justin123 | Posted: 22-02-2022

Answer 9

The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.

Answered by: Carlos241 | Posted: 22-02-2022

Answer 10

In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.

When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.

IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.

Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.

The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.

The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.

I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.

Answered by: Arnold690 | Posted: 22-02-2022

Answer 11

If a floating-point type is viewed as representing a precise value, then as other posters have noted, every float value is representable as a double, but only a few values of double can be represented by float. On the other hand, if one recognizes that floating-point values are approximations, one will realize the real situation is reversed. If one uses a very precise instrument to measure something which is 3.437mm, one may correctly describe is size as 3.4mm. if one uses a ruler to measure the object as 3.4mm, it would be incorrect to describe its size as 3.400mm.

Even bigger problems exist at the top of the range. There is a float value that represents: "computed value exceeded 2^127 by an unknown amount", but there's no double value that indicates such a thing. Casting an "infinity" from single to double will yield a value "computed value exceeded 2^1023 by an unknown amount" which is off by a factor of over a googol.

Answered by: Elise637 | Posted: 22-02-2022

Similar questions

java - Running time expressed using O and Theta

For the given code (I am using just one from my previous questions), the running time using O notation is O(n^2). If I want to express the running time using Theta notation would it be the same? Meaning Theta(n^2)? for(int i=0; i&lt;N; i++){ for(int j=1; j&lt;N; j++){ System.out.println("Yayyyy"); if(i&lt;=j){ System.out.println("Yayyy not"); } } }

java - In what unit of measure are expressed the absolute position of a table? Can I express this position in cm?

this is my first time with iText and I have some doubt about the absolute position of a table. com.itextpdf.text.Document document = new com.itextpdf.text.Document(com.itextpdf.text.PageSize.A4, 0, 0, 0, 0); try { PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(result));; PdfPTable table = new PdfPTable(1); table.setTotalWidth(100); PdfContentB...

java - Find how many times each number between N and M can be expressed as a sum of a pair of primes

Consider this method: public static int[] countPairs(int min, int max) { int lastIndex = primes.size() - 1; int i = 0; int howManyPairs[] = new int[(max-min)+1]; for(int outer : primes) { for(int inner : primes.subList(i, lastIndex)) { int sum = outer + inner; if(sum &gt; max) break; if(sum &gt;= min &amp;&amp; sum &lt;= max) ...

Can all loops (for, while, do while) be expressed in terms of one another in java ?

I understand that we can express a do while loop in the form of a while loop and a while loop in the form of for loop. So the following conversions are possible (Correct me if I am wrong): For -> While While -> For Do-While -> While Do-While -> For But I am not sure if I can convert a while loop to do while as while loop doesn't necessarily run atleast once. Is there a way around to ...

java - Reading existing csv and then writing back to CSV puts inches expressed as double quotes in a cell as "" instead of \"

I have a CSV that I generate by building StringBuilder and writing to using PrintWriter. Then I read that CSV again and append something to it, but it messes up the cell which has double quotes in it, used to denote inches. It prints double-quotes twice as 15&quot; One of the values being added to StringBuilder is this: Code 1.1 String titl...

Still can't find your answer? Check out these amazing Java communities for help...

Java Reddit Community | Java Help Reddit Community | Java Community | Java Discord | Java Programmers (Facebook) | Java developers (Facebook)