Abstract data `IEEE' may be used for implementation of a cross-compiler. This abstract data implements IEEE floating point arithmetic by machine independent way with the aid of package `arithm'. This abstract data is necessary because host machine may not support such arithmetic for target machine. For example, VAX does not support IEEE floating point arithmetic. The floating point numbers are represented by bytes in big endian mode. The implementation of the package functions are not sufficiently efficient in order to use for run-time. The package functions are oriented to implement constant-folding in compilers. All integer sizes (see transformation functions) are given in bytes and must be positive.
Functions of addition, subtraction, multiplication, division, conversion floating point numbers of different formats can fix input exceptions. If an operand of such operation is trapping (signal) not a number then invalid operation and reserved operand exceptions are fixed and the result is (quiet) NaN, otherwise if an operand is (quiet) NaN then only reserved operand exception is fixed and the result is (quiet) NaN. Operation specific processing the rest of special case values of operands is placed with description of the operation. In general case the function can fix output exceptions and produces results for exception according to the following table. The result and status for a given exceptional operation are determined by the highest priority exception. If, for example, an operation produces both overflow and imprecise result exceptions, the overflow exception, having higher priority, determines the behavior of the operation. The behavior of this operation is therefore described by the Overflow entry of the table.
Exception|Condition| |Result |Status
-----------|---------|---------------------|-------|-------------
|masked | IEEE_RN(_RP)| +Inf |IEEE_OFL and
|overflow | sign + IEEE_RZ(_RM)| +Max |IEEE_IMP
|exception|---------------------|-------|-------------
Overflow | | sign - IEEE_RN(_RM)| -Inf |IEEE_OFL and
| | IEEE_RZ(_RP)| -Max |IEEE_IMP
|---------|---------------------|-------|-------------
|unmasked | Precise result |See |IEEE_OFL
|overflow |---------------------|above |-------------
|exception| Imprecise result | |IEEE_OFL and
| | | |IEEE_IMP
-----------|---------|---------------------|-------|-------------
|masked | |Rounded|IEEE_UFL and
|underflow| Imprecise result |result |IEEE_IMP
Underflow |exception| | |
|---------|---------------------|-------|-------------
|unmasked | Precise result |result |IEEE_UFL
|underflow|---------------------|-------|-------------
|exception| Imprecise result |Rounded|IEEE_UFL and
| | |result |IEEE_IMP
-----------|-------------------------------|-------|-------------
|masked imprecise exception |Rounded|IEEE_IMP
Imprecise | |result |
|-------------------------------|-------|-------------
|unmasked imprecise exception |Rounded|IEEE_IMP
| |result |
The package uses package `bits'. The interface part of the abstract data is file `IEEE.h'. The implementation part is file `IEEE.c'. The interface contains the following external definitions:
have values which are sizes of IEEE single, double, and quad precision floating point numbers (`4', `8', and `16' correspondingly).
have values which are maximal length of string generated by functions creating decimal ascii representation of IEEE floats (see functions IEEE_single_to_string, IEEE_doublele_to_string, and IEEE_quad_to_string).
have values which are maximal length of string generated by functions creating binary ascii representation of IEEE floats with given base (see functions IEEE_single_to_binary_string, IEEE_doublele_to_binary_string, and IEEE_quad_to_binary_string).
represent correspondingly IEEE single precision, double, and quad precision floating point numbers. The size of these type are equal to `IEEE_FLOAT_SIZE', `IEEE_DOUBLE_SIZE', and `IEEE_QUAD_SIZE'.
`void IEEE_reset (void)'
and to separate bits in mask returned by functions
`IEEE_get_sticky_status_bits',
`IEEE_get_status_bits', and
`IEEE_get_trap_mask'.
`int IEEE_get_trap_mask (void)'
returns exceptions trap mask. Function
`int IEEE_set_trap_mask (int mask)'
sets up new exception trap mask and returns the previous.
If the mask bit corresponding given exception is set, a floating point exception trap does not occur for given exception. Such exception is said to be masked exception. Initial exception trap mask is zero. Remember that more one exception may be occurred simultaneously.
`int IEEE_set_sticky_status_bits (int mask)'
changes sticky status bits and returns the previous bits.
Function
`int IEEE_get_sticky_status_bits (void)'
returns mask of current sticky status bits. Only sticky
status bits corresponding to masked exceptions are updated
regardless whether a floating point exception trap is taken or
not. Initial values of sticky status bits are zero.
`int IEEE_get_status_bits (void)'
returns mask of status bits. It is supposed that the function
will be used in trap on an floating point exception. Status
bits are updated regardless of the current exception trap mask
only when a floating point exception trap is taken. Initial
values of status bits are zero.
defines rounding control (round to nearest representable number, round toward minus infinity, round toward plus infinity, round toward zero).
Round to nearest means the result produced is the representable value nearest to the infinitely-precise result. There are special cases when infinitely precise result falls exactly halfway between two representable values. In this cases the result will be whichever of those two representable values has a fractional part whose least significant bit is zero.
Round toward minus infinity means the result produced is the representable value closest to but no greater than the infinitely precise result.
Round toward plus infinity means the result produced is the representable value closest to but no less than the infinitely precise result.
Round toward zero, i.e. the result produced is the representable value closest to but no greater in magnitude than the infinitely precise result. There are two functions
`int IEEE_set_round (int round_mode)'
which sets up current rounding mode and returns previous mode
and
`int IEEE_get_round (void)'
which returns current mode. Initial rounding mode is round to
nearest.
`void default_floating_point_exception_trap (void)'
Originally reaction on occurred trap on an unmasked floating
point exception is equal to this function. The function does
nothing. All occurred exceptions can be found in the trap with
the aid of status bits.
`void (*IEEE_set_floating_point_exception_trap
(void (*function) (void))) (void)'
sets up trap on an unmasked exception. Function given as
parameter simulates floating point exception trap.
`IEEE_float_t IEEE_positive_zero (void)'
returns positive single precision zero constant. There are
analogous functions which return other special case values:
`IEEE_negative_zero',
`IEEE_NaN',
`IEEE_trapping_NaN',
`IEEE_positive_infinity',
`IEEE_negative_infinity',
`IEEE_double_positive_zero',
`IEEE_double_negative_zero',
`IEEE_double_NaN',
`IEEE_double_trapping_NaN',
`IEEE_double_positive_infinity',
`IEEE_double_negative_infinity'.
`IEEE_quad_positive_zero',
`IEEE_quad_negative_zero',
`IEEE_quad_NaN',
`IEEE_quad_trapping_NaN',
`IEEE_quad_positive_infinity',
`IEEE_quad_negative_infinity'.
According to the IEEE standard NaN (and trapping NaN) can be represented by more one bit string. But all functions of the package generate and use only one its representation created by function `IEEE_NaN' (and `IEEE_trapping_NaN', `IEEE_double_NaN', `IEEE_double_trapping_NaN', `IEEE_quad_NaN', `IEEE_quad_trapping_NaN'). A (quiet) NaN does not cause an Invalid Operation exception and can be reported as an operation result. A trapping NaN causes an Invalid Operation exception if used as in input operand to floating point operation. Trapping NaN can not be reported as an operation result.
`int IEEE_is_positive_zero (IEEE_float single_float)'
returns 1 if given number is positive single precision zero
constant. There are analogous functions for other special
case values:
`IEEE_is_negative_zero',
`IEEE_is_NaN',
`IEEE_is_trapping_NaN',
`IEEE_is_positive_infinity',
`IEEE_is_negative_infinity',
`IEEE_is_positive_maximum' (positive max value),
`IEEE_is_negative_maximum',
`IEEE_is_positive_minimum' (positive min value),
`IEEE_is_negative_minimum',
`IEEE_is_double_positive_zero',
`IEEE_is_double_negative_zero',
`IEEE_is_double_NaN',
`IEEE_is_double_trapping_NaN',
`IEEE_is_double_positive_infinity',
`IEEE_is_double_negative_infinity',
`IEEE_is_double_positive_maximum',
`IEEE_is_double_negative_maximum',
`IEEE_is_double_positive_minimum',
`IEEE_is_double_negative_minimum'.
`IEEE_is_quad_positive_zero',
`IEEE_is_quad_negative_zero',
`IEEE_is_quad_NaN',
`IEEE_is_quad_trapping_NaN',
`IEEE_is_quad_positive_infinity',
`IEEE_is_quad_negative_infinity',
`IEEE_is_quad_positive_maximum',
`IEEE_is_quad_negative_maximum',
`IEEE_is_quad_positive_minimum',
`IEEE_is_quad_negative_minimum'.
In spite of that all functions of the package generate and use
only one its representation created by function `IEEE_NaN' (or
`IEEE_trapping_NaN', or `IEEE_double_NaN', or
`IEEE_double_trapping_NaN', or `IEEE_quad_NaN', or
`IEEE_quad_trapping_NaN'). The function `IEEE_is_NaN' (and
`IEEE_trapping_NaN', and `IEEE_double_NaN', and
`IEEE_double_trapping_NaN', and `IEEE_quad_NaN', and
`IEEE_quad_trapping_NaN') determines any representation of
NaN.
`int IEEE_is_normalized (IEEE_float_t single_float)'
returns TRUE if single precision number is normalized (special
case values are not normalized). There is analogous function
`IEEE_is_denormalized'
for determination of denormalized number. There are analogous
functions
`IEEE_is_double_normalized' and
`IEEE_is_double_denormalized' and
`IEEE_is_quad_normalized' and
`IEEE_is_quad_denormalized'
for doubles and quads.
`IEEE_float_t IEEE_add_single (IEEE_float_t single1,
IEEE_float_t single2)'
makes single precision addition of floating point numbers.
There are analogous functions which implement other floating
point operations:
`IEEE_subtract_single',
`IEEE_multiply_single',
`IEEE_divide_single',
`IEEE_add_double',
`IEEE_subtract_double',
`IEEE_multiply_double',
`IEEE_divide_double'.
`IEEE_add_quad',
`IEEE_subtract_quad',
`IEEE_multiply_quad',
`IEEE_divide_quad'.
Results and input exceptions for operands of special cases
values (except for NaNs) are described for addition by the
following table
first | second operand
operand|---------------------------------------
| +Inf | -Inf | Others
-------|--------------|-------------|----------
+Inf | +Inf | NaN | +Inf
| none |IEEE_INV(_RO)| none
-------|--------------|-------------|----------
-Inf | NaN | -Inf | -Inf
|IEEE_INV(_RO) | none | none
-------|--------------|-------------|----------
Others | +Inf | -Inf |
| none | none |
Results and input exceptions for operands of special cases
values (except for NaNs) are described for subtraction by the
following table
first | second operand
operand|---------------------------------------
| +Inf | -Inf | Others
-------|-------------|--------------|----------
+Inf | NaN | +Inf | +Inf
|IEEE_INV(_RO)| none | none
-------|-------------|--------------|----------
-Inf | -Inf | NaN | -Inf
| none |IEEE_INV(_RO) | none
-------|-------------|--------------|----------
Others | -Inf | +Inf |
| none | none |
Results and input exceptions for operands of special cases
values (except for NaNs) are described for multiplication by
the following table
first | second operand
operand|---------------------------------------------------
| +Inf | -Inf | 0 | Others
-------|-------------|-------------|-------------|---------
+Inf | +Inf | -Inf | NaN | (+-)Inf
| none | none |IEEE_INV(_RO)| none
-------|-------------|-------------|-------------|---------
-Inf | -Inf | +Inf | NaN | (+-)Inf
| none | none |IEEE_INV(_RO)| none
-------|-------------|-------------|-------------|---------
0 | NaN | NaN | (+-)0 | (+-)0
|IEEE_INV(_RO)|IEEE_INV(_RO)| none | none
-------|-------------|-------------|-------------|---------
Others | (+-)Inf | (+-)Inf | (+-)0 |
| none | none | none |
Results and input exceptions for operands of special cases
values (except for NaNs) are described for division by the
following table
first | second operand
operand|---------------------------------------------------
| +Inf | -Inf | 0 | Others
-------|-------------|-------------|-------------|---------
+Inf | NaN | NaN | (+-)Inf | (+-)Inf
|IEEE_INV(_RO)|IEEE_INV(_RO)| none | none
-------|-------------|-------------|-------------|---------
-Inf | NaN | NaN | (+-)Inf | (+-)Inf
|IEEE_INV(_RO)|IEEE_INV(_RO)| none | none
-------|-------------|-------------|-------------|---------
0 | (+-)0 | (+-)0 | NaN | (+-)0
| none | none |IEEE_INV(_RO)| none
-------|-------------|-------------|-------------|---------
Others | (+-)0 | (+-)0 | (+-)Inf |
| none | none | IEEE_DZ |
`int IEEE_eq_single (IEEE_float_t single1,
IEEE_float_t single2)'
compares two single precision floating point numbers on
equality and returns 1 or 0 depending on result of the
comparison. There are analogous functions which implement
other integer operations:
`IEEE_ne_single',
`IEEE_gt_single',
`IEEE_lt_single',
`IEEE_ge_single',
`IEEE_le_single',
`IEEE_eq_double',
`IEEE_ne_double',
`IEEE_gt_double',
`IEEE_lt_double',
`IEEE_ge_double',
`IEEE_le_double'.
`IEEE_eq_quad',
`IEEE_ne_quad',
`IEEE_gt_quad',
`IEEE_lt_quad',
`IEEE_ge_quad',
`IEEE_le_quad'.
Results and input exceptions for operands of special cases
values are described for equality and inequality by the
following table
first | second operand
operand|---------------------------------------
| SNaN | QNaN | Others
-------|-------------|--------------|----------
SNaN | FALSE | FALSE | FALSE
| IEEE_INV | IEEE_INV | IEEE_INV
-------|-------------|--------------|----------
QNaN | FALSE | FALSE | FALSE
| IEEE_INV | none | none
-------|-------------|--------------|----------
Others | FALSE | FALSE |
| IEEE_INV | none |
Results and input exceptions for operands of special cases
values are described for other comparison operation by the
following table
first | second operand
operand|---------------------------------------
| SNaN | QNaN | Others
-------|-------------|--------------|----------
SNaN | FALSE | FALSE | FALSE
| IEEE_INV | IEEE_INV | IEEE_INV
-------|-------------|--------------|----------
QNaN | FALSE | FALSE | FALSE
| IEEE_INV | IEEE_INV | IEEE_INV
-------|-------------|--------------|----------
Others | FALSE | FALSE |
| IEEE_INV | IEEE_INV |
`IEEE_double_t IEEE_single_to_double
(IEEE_float_t single_float)',
`IEEE_float_t IEEE_double_to_single
(IEEE_double_t double_float)',
`IEEE_quad_t IEEE_single_to_quad
(IEEE_float_t single_float)',
`IEEE_float_t IEEE_quad_to_single
(IEEE_quad_t quad_float)',
`IEEE_quad_t IEEE_double_to_quad
(IEEE_double_t double_float)',
`IEEE_double_t IEEE_quad_to_double
(IEEE_quad_t quad_float)',
`IEEE_float_t IEEE_single_from_integer
(int size, const void *integer)',
`IEEE_float_t IEEE_single_from_unsigned_integer
(int size, const void *unsigned_integer)',
`IEEE_double_t IEEE_double_from_integer
(int size, const void *integer)',
`IEEE_double_t IEEE_double_from_unsigned_integer
(int size, const void *unsigned_integer)',
`IEEE_quad_t IEEE_quad_from_integer
(int size, const void *integer)',
`IEEE_quad_t IEEE_quad_from_unsigned_integer
(int size, const void *unsigned_integer)',
`void IEEE_single_to_integer
(int size, IEEE_float_t single_float, void *integer)',
`void IEEE_single_to_unsigned_integer
(int size, IEEE_float_t single_float,
void *unsigned_integer)',
`void IEEE_double_to_integer
(int size, IEEE_double_t double_float, void *integer)',
`void IEEE_double_to_unsigned_integer
(int size, IEEE_double_t double_float,
void *unsigned_integer)'.
`void IEEE_quad_to_integer
(int size, IEEE_quad_t quad_float, void *integer)',
`void IEEE_quad_to_unsigned_integer
(int size, IEEE_quad_t quad_float,
void *unsigned_integer)'.
Actually no one output exceptions occur during transformation
of single precision floating point number to double and quad
precision number or of double precision floating point number
to quad precision number. No input exceptions occur during
transformation of integer numbers to floating point numbers.
Results and input exceptions for operand of special cases
values (and for NaNs) are described for conversion floating
point number to integer by the following table
Operand | Result & Exception
--------------|-------------------
SNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
QNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
+Inf | IMax
| IEEE_INV
--------------|-------------------
-Inf | IMin
| IEEE_INV
--------------|-------------------
Others |
|
Results and input exceptions for operand of special cases
values (and for NaNs) are described for conversion floating
point number to unsigned integer by the following table
Operand | Result & Exception
--------------|-------------------
SNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
QNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
+Inf | IMax
| IEEE_INV
--------------|-------------------
-Inf or | 0
negative number| IEEE_INV
--------------|-------------------
Others |
|
Results and exceptions for NaNs during transformation of
floating point numbers to (unsigned) integers are differed
from the ones for operations of addition, multiplication and
so on.
`char *IEEE_single_to_string (IEEE_float_t single_float,
char *result)'
transforms single precision to decimal ascii representation
with obligatory integer part (1 digit), fractional part (of
constant length), and optional exponent. Signs minus are
present if it is needed. The special cases IEEE floating
point values are represented by strings `SNaN', `QNaN',
`+Inf', `-Inf', `+0', and `-0'. The function returns value
`result'. There are analogous functions
`IEEE_string_to_double'
`IEEE_string_to_quad'
for doubles and quads. Current round mode does not affect
the resultant ascii representation. The function outputs 9
decimal fraction digits for single precision number, 17
decimal fraction digits for double precision number, and 36
for quad precision number.
`char *IEEE_single_to_binary_string (IEEE_float_t single_float,
int base, char *result)'
The function is analogous to IEEE_single_to_string but
transforms float number into to binary ascii representation
with obligatory integer part (1 digit) of given base, optional
fractional part of given base, and optional binary exponent
(decimal number giving power of 2). The binary exponent
starts with character `p' instead of `e'. Signs minus are
present if it is needed. The special cases IEEE floating
point values are represented by strings `SNaN', `QNaN',
`+Inf', `-Inf', `+0', and `-0'. The function returns value
`result'. Value of parameter base should be 2, 4, 8, or 16.
There are analogous functions
`IEEE_string_to_binary_double'
`IEEE_string_to_binary_quad'
for doubles and quads. Current round mode does not affect
the resultant ascii representation.
`char *IEEE_single_from_string (const char *operand,
IEEE_float_t *result)'
skips all white spaces at the begin of source string and
transforms tail of the source string to single precision
floating point number. The number must correspond the
following syntax
['+' | '-'] [<decimal digits>] [ '.' [<decimal digits>] ]
[ ('e' | 'E') ['+' | '-'] <decimal digits>]
or must be the following strings `SNaN', `QNaN', `+Inf',
`-Inf', `+0', or `-0'. The function returns pointer to first
character in the source string after read floating point
number. If the string does not correspond floating point
number syntax the result will be zero and function returns the
source string.
The function can fix output exceptions as described above. There are analogous functions
`IEEE_double_from_string'
`IEEE_quad_from_string'
for doubles and quads. Current round mode may affect
resultant floating point number. It is guaranteed that
transformation `IEEE floating point number -> string -> IEEE
floating point number' results in the same IEEE floating point
number if round to nearest mode is used. But the reverse
transformation `string with 9 (or 17 or 36) digits -> IEEE
floating point number -> string' may results in different
digits of the fractions in ascii representation because a
floating point number may represent several such strings with
differences in the least significant digit. But the ascii
representations are identical when functions
`IEEE_single_from_string', `IEEE_double_from_string',
`IEEE_quad_from_string' do not fix imprecise result exception
or less than 9 (or 17 or 36) digits of the fractions in the
ascii representations are compared.
`char *IEEE_single_from_binary_string (const char *operand,
int base,
IEEE_float_t *result)'
The function is analogous to IEEE_single_to_string but
transforms binary representation of the single precision
floating point number. The number must correspond the
following syntax
['+' | '-'] [<digits less base>] [ '.' [<digits less base>] ]
[ ('p' | 'P') ['+' | '-'] <decimal digits>]
or must be the following strings `SNaN', `QNaN', `+Inf',
`-Inf', `+0', or `-0'. The function returns pointer to first
character in the source string after read floating point
number. If the string does not correspond floating point
number syntax the result will be zero and function returns the
source string. The exponent (after character `p' or `P')
defines power of two.
The function can fix output exceptions as described above. There are analogous functions
`IEEE_double_from_binary_string'
`IEEE_quad_from_binary_string'
for doubles and quads. Current round mode can affect
resultant floating point number if there are too many given
digits.