From Get docs

16.2 Other Stuff to Know

The rest of this chapter uses a number of terms. Here are some informal definitions that should help you work your way through the material here:


A floating-point calculation’s accuracy is how close it comes to the real (paper and pencil) value.


The difference between what the result of a computation “should be” and what it actually is. It is best to minimize error as much as possible.


The order of magnitude of a value; some number of bits in a floating-point value store the exponent.


A special value representing infinity. Operations involving another number and infinity produce infinity.


“Not a number.”99 A special value that results from attempting a calculation that has no answer as a real number. In such a case, programs can either receive a floating-point exception, or get NaN back as the result. The IEEE 754 standard recommends that systems return NaN. Some examples:


This makes sense in the range of complex numbers, but not in the range of real numbers, so the result is NaN.


-8 is out of the domain of log(), so the result is NaN.


How the significand (see later in this list) is usually stored. The value is adjusted so that the first bit is one, and then that leading one is assumed instead of physically stored. This provides one extra bit of precision.


The number of bits used to represent a floating-point number. The more bits, the more digits you can represent. Binary and decimal precisions are related approximately, according to the formula:

prec = 3.322 * dps

Here, prec denotes the binary precision (measured in bits) and dps (short for decimal places) is the decimal digits.

Rounding mode

How numbers are rounded up or down when necessary. More details are provided later.


A floating-point value consists of the significand multiplied by 10 to the power of the exponent. For example, in 1.2345e67, the significand is 1.2345.


From the Wikipedia article on numerical stability: “Calculations that can be proven not to magnify approximation errors are called numerically stable.”

See the Wikipedia article on accuracy and precision for more information on some of those terms.

On modern systems, floating-point hardware uses the representation and operations defined by the IEEE 754 standard. Three of the standard IEEE 754 types are 32-bit single precision, 64-bit double precision, and 128-bit quadruple precision. The standard also specifies extended precision formats to allow greater precisions and larger exponent ranges. (awk uses only the 64-bit double-precision format.)

Table 16.3 lists the precision and exponent field values for the basic IEEE 754 binary formats.

Name Total bits Precision Minimum exponent Maximum exponent
Single 32 24 -126 +127
Double 64 53 -1022 +1023
Quadruple 128 113 -16382 +16383

Table 16.3: Basic IEEE format values

NOTE: The precision numbers include the implied leading one that gives them

one extra bit of significand.



Thanks to Michael Brennan for this description, which we have paraphrased, and for the examples.