This post explains how to convert floating point numbers to binary numbers in the IEEE 754 format. A good link on the subject of IEEE 754 conversion exists at Thomas Finleys website. For this post I will stick with the IEEE 754 single precision binary floating-point format: binary32. See this other posting for C++, Java and Python implementations for converting between the binary and decimal formats.

** Expressing numbers in scientific notation**

You may be aware that binary numbers, like decimal numbers, can have decimal points. And that binary numbers, like decimal numbers, can be expressed using scientific notation:

decimal: 923.52 = 9.2352 x 10^{2}

binary: 101011.101 = 1.01011101 x 2^{5}

The number that the 10 or 2 is raised to, the “exponent”, represents the number of places shifted to the left or the right of the decimal point accordingly.

**IEEE 754 **Representation for binary32

In IEEE 754 floating-point representation, the binary number is divided into three sections: the sign bit, the exponent and the mantissa (fractional part).

Sign | Exponent | Mantissa |
---|---|---|

0 | 01111100 | 01000000000000000000000 |

**Sign bit**

This occupies just one bit and represents the sign: 0 for positive and 1 for negative.

**Exponent**

The exponent section for a 16-bit (half-precision) floating point occupies 5 bits and stores the exponent value described above. For 32-bit (single-precision) as in the above binary32 example, this section occupies 8 bits; for 64-bit (double-precision) formats this section will occupy 11 bits.

**Dealing with positive and negative exponents**

An 8-bit exponent encoding can represent integers from 0 (00000000) to 255 (11111111). But what about negative exponents? We need to be able to include these, too. To cover this, we ensure that the exponent is of value 127 greater.

If our exponent is (say) 3 then add 127 to it to give 3 + 127 = 130 (decimal) = 10000010 (binary). This bias is simply 2^{n} – 1 where n is the number of exponent bits, so 8 bit exponent encodings would have a bias of 2^{8} – 1 = 128 – 1 = 127.

If our exponent was minus 3, then the outcome would be -3 + 127 = 124 (decimal) = 1111100 (binary). In other words, (00000000) to (01111111) represents the exponents from -127 to zero, and (10000000) to (11111111) would represent the exponents from +1 to 128.

**Mantissa**

The third section of our 32-bit representation is 23 bits long. The mantissa, sometimes called the significand, represents the fractional part of the number in binary scientific notation ie the binary number to the right of the decimal point.

**Example: 12.375 into IEEE 754 binary format**

This example for converting from decimal representation into a binary32 format is taken from the Wikipedia page. Consider the number 12.375.

Take the non-fractional part of 12.375 and convert it into binary in the normal way:

12 (decimal) is 1100 (binary)

Since 12 = (8 * 1) + (4 * 1) + (2 * 0) + (2 * 0)

Converting the fractional part (0.375) into binary is done using the following procedure:

1. multiply the fraction by 2

2. keep the integer part of multiplication as the binary result

3. re-multiply new fraction by 2

4. repeat 1 – 3 until a fraction of zero is found or until the precision limit is reached which is 23 fraction digits for IEEE 754 binary32 format i.e.:

`0.375 x 2 = 0.750 = 0 + 0.750 => 0`

0.750 x 2 = 1.500 = 1 + 0.500 => 1

0.500 x 2 = 1.000 = 1 + 0.000 => 1

The fraction part eventually comes to 0.000, so we terminate. The binary result is 011, therefore

0.375 (decimal) is 0.011 (binary)

So

12.375 (decimal) is now 1100.011 (binary).

** Convert result to the required binary scientific format**

IEEE 754 binary32 format requires that you represent values in the scientific format described previously, so that

**1100.011 = 1.100011 x 2 ^{3}.**

From this scientific notation we can now deduce:

Sign = 0 (positive number)

Exponent = 3

bias = 2^{8}-1 = 127 (8-bit exponent encoding for binary32)

adding this to the exponent gives:

3 + 127 = 130 (decimal) = 10000010 (binary)

Mantissa = 100011 (fractional part to the right of the decimal point)

From these we form the resulting 32 bit IEEE 754 binary32 format representation of 12.375 as:

Sign | Exponent | Mantissa |
---|---|---|

0 | 10000010 | 10001100000000000000000 |

This weblog is unquestionably relatively useful considering that I’m with the instant producing a web floral web site – though I’m only commencing out consequently it is actually rather tiny, absolutely nothing such as this web site. Can website link to some from the posts right here because they are very. Many thanks a lot. Zoey Olsen

GOOD ……

Thanks Yogesh.

Superb. Nice way of Explaining

Thank you Riyaz.

Very nice tutorials on IEEE 754 float point representation

Good one..

Thank you Somanath.

got a great help to understand to conversion of floating point to ieee

Thanks a lot!