Converting IEEE 754 Floating Point into Binary

This post explains how to convert floating point numbers to binary numbers in the IEEE 754 format. A good link on the subject of IEEE 754 conversion exists at Thomas Finleys website. For this post I will stick with the IEEE 754 single precision binary floating-point format: binary32. See this other posting for C++, Java and Python implementations for converting between the binary and decimal formats.

Expressing numbers in scientific notation

You may be aware that binary numbers, like decimal numbers, can have decimal points. And that binary numbers, like decimal numbers, can be expressed using scientific notation:
decimal: 923.52 = 9.2352 x 10² binary: 101011.101 = 1.01011101 x 2⁵

The number that the 10 or 2 is raised to, the “exponent”, represents the number of places shifted to the left or the right of the decimal point accordingly.

IEEE 754 Representation for binary32

In IEEE 754 floating-point representation, the binary number is divided into three sections: the sign bit, the exponent and the mantissa (fractional part).

Sign	Exponent	Mantissa
0	01111100	01000000000000000000000

Sign bit

This occupies just one bit and represents the sign: 0 for positive and 1 for negative.

Exponent

The exponent section for a 16-bit (half-precision) floating point occupies 5 bits and stores the exponent value described above. For 32-bit (single-precision) as in the above binary32 example, this section occupies 8 bits; for 64-bit (double-precision) formats this section will occupy 11 bits.

Dealing with positive and negative exponents

An 8-bit exponent encoding can represent integers from 0 (00000000) to 255 (11111111). But what about negative exponents? We need to be able to include these, too. To cover this, we ensure that the exponent is of value 127 greater.

If our exponent is (say) 3 then add 127 to it to give 3 + 127 = 130 (decimal) = 10000010 (binary). This bias is simply 2ⁿ – 1 where n is the number of exponent bits, so 8 bit exponent encodings would have a bias of 2⁸ – 1 = 128 – 1 = 127.

If our exponent was minus 3, then the outcome would be -3 + 127 = 124 (decimal) = 1111100 (binary). In other words, (00000000) to (01111111) represents the exponents from -127 to zero, and (10000000) to (11111111) would represent the exponents from +1 to 128.

Mantissa

The third section of our 32-bit representation is 23 bits long. The mantissa, sometimes called the significand, represents the fractional part of the number in binary scientific notation ie the binary number to the right of the decimal point.

Example: 12.375 into IEEE 754 binary format

This example for converting from decimal representation into a binary32 format is taken from the Wikipedia page. Consider the number 12.375.

Take the non-fractional part of 12.375 and convert it into binary in the normal way:

12 (decimal) is 1100 (binary)

Since 12 = (8 * 1) + (4 * 1) + (2 * 0) + (2 * 0)

Converting the fractional part (0.375) into binary is done using the following procedure:

1. multiply the fraction by 2
2. keep the integer part of multiplication as the binary result
3. re-multiply new fraction by 2
4. repeat 1 – 3 until a fraction of zero is found or until the precision limit is reached which is 23 fraction digits for IEEE 754 binary32 format i.e.:

0.375 x 2 = 0.750 = 0 + 0.750 => 0 0.750 x 2 = 1.500 = 1 + 0.500 => 1 0.500 x 2 = 1.000 = 1 + 0.000 => 1

The fraction part eventually comes to 0.000, so we terminate. The binary result is 011, therefore

0.375 (decimal) is 0.011 (binary)

12.375 (decimal) is now 1100.011 (binary).

Convert result to the required binary scientific format

IEEE 754 binary32 format requires that you represent values in the scientific format described previously, so that

1100.011 = 1.100011 x 2³.

From this scientific notation we can now deduce:

Sign = 0 (positive number)

Exponent = 3

bias = 2⁸-1 = 127 (8-bit exponent encoding for binary32)
adding this to the exponent gives:

3 + 127 = 130 (decimal) = 10000010 (binary)

Mantissa = 100011 (fractional part to the right of the decimal point)

From these we form the resulting 32 bit IEEE 754 binary32 format representation of 12.375 as:

Sign	Exponent	Mantissa
0	10000010	10001100000000000000000

Converting IEEE 754 Floating Point into Binary

About The Author

Andy

Related Posts

Using the WiX Toolset to create installers in Visual Studio C# projects

How to consume a Web API service from a console application

Getting Started with OpenGL for Windows

About The Author

Andy