Abstract data `IEEE' may be used for implementation of a cross-compiler. This abstract data implements IEEE floating point arithmetic by machine independent way with the aid of package `arithm'. This abstract data is necessary because host machine may not support such arithmetic for target machine. For example, VAX does not support IEEE floating point arithmetic. The floating point numbers are represented by bytes in big endian mode. The implementation of the package functions are not sufficiently efficient in order to use for run-time. The package functions are oriented to implement constant-folding in compilers. All integer sizes (see transformation functions) are given in bytes and must be positive.
Functions of addition, subtraction, multiplication, division, conversion of floating point numbers of different formats can fix input exceptions. If an operand of such operation is trapping (signal) not a number then invalid operation and reserved operand exceptions are fixed and the result is (quiet) NaN, otherwise if an operand is (quiet) NaN then only reserved operand exception is fixed and the result is (quiet) NaN. Operation specific processing the rest of special case values of operands is placed with description of the operation. In general case the function can fix output exceptions and produces results for exception according to the following table. The result and status for a given exceptional operation are determined by the highest priority exception. If, for example, an operation produces both overflow and imprecise result exceptions, the overflow exception, having higher priority, determines the behavior of the operation. The behavior of this operation is therefore described by the Overflow entry of the table.
Exception|Condition| |Result |Status
-----------|---------|---------------------|-------|-------------
|masked | IEEE_RN(_RP)| +Inf |IEEE_OFL and
|overflow | sign + IEEE_RZ(_RM)| +Max |IEEE_IMP
|exception|---------------------|-------|-------------
Overflow | | sign - IEEE_RN(_RM)| -Inf |IEEE_OFL and
| | IEEE_RZ(_RP)| -Max |IEEE_IMP
|---------|---------------------|-------|-------------
|unmasked | Precise result |See |IEEE_OFL
|overflow |---------------------|above |-------------
|exception| Imprecise result | |IEEE_OFL and
| | | |IEEE_IMP
-----------|---------|---------------------|-------|-------------
|masked | |Rounded|IEEE_UFL and
|underflow| Imprecise result |result |IEEE_IMP
Underflow |exception| | |
|---------|---------------------|-------|-------------
|unmasked | Precise result |result |IEEE_UFL
|underflow|---------------------|-------|-------------
|exception| Imprecise result |Rounded|IEEE_UFL and
| | |result |IEEE_IMP
-----------|-------------------------------|-------|-------------
|masked imprecise exception |Rounded|IEEE_IMP
Imprecise | |result |
|-------------------------------|-------|-------------
|unmasked imprecise exception |Rounded|IEEE_IMP
| |result |
The package uses package `bits'. The interface part of the abstract data is file `IEEE.h'. The implementation part is file `IEEE.cpp'. The interface contains the following external definitions:
have values which are are sizes of IEEE single, double and quad precision floating point numbers (`4', `8', and `16' correspondingly).
have values which are maximal length of string generated by functions creating decimal ascii representation of IEEE floats (see functions to_string).
have values which are maximal length of string generated by functions creating binary ascii representation of IEEE floats with given base (see functions to_binary_string).
are simply synonyms of classes `IEEE_float', `IEEE_double', and `IEEE_quad' representing correspondingly IEEE single precision, double, and quad precision floating point numbers.
defines rounding control (round to nearest representable number, round toward minus infinity, round toward plus infinity, round toward zero).
Round to nearest means the result produced is the representable value nearest to the infinitely-precise result. There are special cases when infinitely precise result falls exactly halfway between two representable values. In this cases the result will be whichever of those two representable values has a fractional part whose least significant bit is zero.
Round toward minus infinity means the result produced is the representable value closest to but no greater than the infinitely precise result.
Round toward plus infinity means the result produced is the representable value closest to but no less than the infinitely precise result.
Round toward zero, i.e. the result produced is the representable value closest to but no greater in magnitude than the infinitely precise result.
The class has the following functions common for all packages:
`void reset (void)'
and to separate bits in mask returned by functions
`IEEE_get_sticky_status_bits',
`IEEE_get_status_bits', and
`IEEE_get_trap_mask'.
`void IEEE_reset (void)'
and to separate bits in mask returned by functions
`IEEE_get_sticky_status_bits',
`IEEE_get_status_bits', and
`IEEE_get_trap_mask'.
`int get_trap_mask (void)'
returns exceptions trap mask. Static public function
`int set_trap_mask (int mask)'
sets up new exception trap mask and returns the previous.
If the mask bit corresponding given exception is set, a floating point exception trap does not occur for given exception. Such exception is said to be masked exception. Initial exception trap mask is zero. Remember that more one exception may be occurred simultaneously.
`int set_sticky_status_bits (int mask)'
changes sticky status bits and returns the previous bits.
Static public function
`int get_sticky_status_bits (void)'
returns mask of current sticky status bits. Only sticky
status bits corresponding to masked exceptions are updated
regardless whether a floating point exception trap is taken or
not. Initial values of sticky status bits are zero.
`int get_status_bits (void)'
returns mask of status bits. It is supposed that the function
will be used in trap on an floating point exception. Status
bits are updated regardless of the current exception trap mask
only when a floating point exception trap is taken. Initial
values of status bits are zero.
`int set_round (int round_mode)'
which sets up current rounding mode and returns previous
mode and
`int IEEE_get_round (void)'
which returns current mode. Initial rounding mode is round
to nearest.
`void default_floating_point_exception_trap (void)'
Originally reaction on occurred trap on an unmasked floating
point exception is equal to this function. The function does
nothing. All occurred exceptions can be found in the trap with
the aid of status bits.
`void (*set_floating_point_exception_trap
(void (*function) (void))) (void)'
sets up trap on an unmasked exception. Function given as
parameter simulates floating point exception trap.
The classes implements IEEE floating point numbers in object-oriented style. The following functions are described for class `IEEE_float'. The classes `IEEE_double' and `IEEE_quad' have analogous functions (if details are absent) with the same names but for IEEE double and quad numbers.
`IEEE_float (void)'
`IEEE_float (float f)'
`IEEE_double (void)'
`IEEE_double (float f)'
`IEEE_quad (void)'
`IEEE_quad (float f)'
creates IEEE single, IEEE double, or IEEE quad precision
numbers with pozitive zero values or with given value.
`void positive_zero (void)'
Given float becomes positive single precision zero
constant. There are analogous functions which return other
special case values:
`negative_zero',
`NaN',
`trapping_NaN',
`positive_infinity',
`negative_infinity',
According to the IEEE standard NaN (and trapping NaN) can be represented by more one bit string. But all functions of the package generate and use only one its representation created by function `NaN' (and `trapping_NaN'). A (quiet) NaN does not cause an Invalid Operation exception and can be reported as an operation result. A trapping NaN causes an Invalid Operation exception if used as in input operand to floating point operation. Trapping NaN can not be reported as an operation result.
`int is_positive_zero (void)'
returns 1 if given number is positive single precision zero
constant. There are analogous functions for other special
case values:
`is_negative_zero',
`is_NaN',
`is_trapping_NaN',
`is_positive_infinity',
`is_negative_infinity',
`is_positive_maximum' (positive max value),
`is_negative_maximum',
`is_positive_minimum' (positive min value),
`is_negative_minimum',
In spite of that all functions of the package generate and
use only one its representation created by function `NaN'
(or `trapping_NaN'). The function `is_NaN' (and
`trapping_NaN') determines any representation of NaN.
`int is_normalized (void)'
returns TRUE if given number is normalized (special case
values are not normalized). There is analogous function
`is_denormalized'
for determination of denormalized number.
`class IEEE_float operator + (class IEEE_float &op)'
makes single precision addition of floating point numbers.
There are analogous operators which implement other floating
point operations:
`-',
`*',
`/',
Results and input exceptions for operands of special cases
values (except for NaNs) are described for addition by the
following table
first | second operand
operand|---------------------------------------
| +Inf | -Inf | Others
-------|--------------|-------------|----------
+Inf | +Inf | NaN | +Inf
| none |IEEE_INV(_RO)| none
-------|--------------|-------------|----------
-Inf | NaN | -Inf | -Inf
|IEEE_INV(_RO) | none | none
-------|--------------|-------------|----------
Others | +Inf | -Inf |
| none | none |
Results and input exceptions for operands of special cases
values (except for NaNs) are described for subtraction by the
following table
first | second operand
operand|---------------------------------------
| +Inf | -Inf | Others
-------|-------------|--------------|----------
+Inf | NaN | +Inf | +Inf
|IEEE_INV(_RO)| none | none
-------|-------------|--------------|----------
-Inf | -Inf | NaN | -Inf
| none |IEEE_INV(_RO) | none
-------|-------------|--------------|----------
Others | -Inf | +Inf |
| none | none |
Results and input exceptions for operands of special cases
values (except for NaNs) are described for multiplication by
the following table
first | second operand
operand|--------------------------------------------
| +Inf | -Inf | 0 | Others
-------|----------|-----------|------------|--------
+Inf | +Inf | -Inf | NaN | (+-)Inf
| none | none | IEEE_INV | none
| | | (_RO) |
-------|----------|-----------|------------|--------
-Inf | -Inf | +Inf | NaN | (+-)Inf
| none | none | IEEE_INV | none
| | | (_RO) |
-------|----------|-----------|------------|--------
0 | NaN | NaN | (+-)0 | (+-)0
| IEEE_INV | IEEE_INV | none | none
| (_RO) | (_RO) | |
-------|----------|-----------|------------|--------
Others | (+-)Inf | (+-)Inf | (+-)0 |
| none | none | none |
Results and input exceptions for operands of special cases
values (except for NaNs) are described for division by the
following table
first | second operand
operand|--------------------------------------------
| +Inf | -Inf | 0 | Others
-------|-----------|-----------|-----------|--------
+Inf | NaN | NaN | (+-)Inf | (+-)Inf
| IEEE_INV | IEEE_INV | none | none
| (_RO) | (_RO) | |
-------|-----------|-----------|-----------|--------
-Inf | NaN | NaN | (+-)Inf | (+-)Inf
| IEEE_INV | IEEE_INV | none | none
| (_RO) | (_RO) | |
-------|-----------|-----------|-----------|--------
0 | (+-)0 | (+-)0 | NaN | (+-)0
| none | none | IEEE_INV | none
| | | (_RO) |
-------|-----------|-----------|-----------|--------
Others | (+-)0 | (+-)0 | (+-)Inf |
| none | none | IEEE_DZ |
`int operator == (class IEEE_float &op)'
compares two floating point numbers on equality and returns
1 or 0 depending on result of the comparison. There are
analogous operators which implement other integer
operations:
`!=',
`>',
`>=',
`<',
`<='.
Results and input exceptions for operands of special cases
values are described for equality and inequality by the
following table
first | second operand
operand|---------------------------------------
| SNaN | QNaN | Others
-------|-------------|--------------|----------
SNaN | FALSE | FALSE | FALSE
| IEEE_INV | IEEE_INV | IEEE_INV
-------|-------------|--------------|----------
QNaN | FALSE | FALSE | FALSE
| IEEE_INV | none | none
-------|-------------|--------------|----------
Others | FALSE | FALSE |
| IEEE_INV | none |
Results and input exceptions for operands of special cases
values are described for other comparison operation by the
following table
first | second operand
operand|---------------------------------------
| SNaN | QNaN | Others
-------|-------------|--------------|----------
SNaN | FALSE | FALSE | FALSE
| IEEE_INV | IEEE_INV | IEEE_INV
-------|-------------|--------------|----------
QNaN | FALSE | FALSE | FALSE
| IEEE_INV | IEEE_INV | IEEE_INV
-------|-------------|--------------|----------
Others | FALSE | FALSE |
| IEEE_INV | IEEE_INV |
`char *to_string (char *result)'
transform single precision to decimal ascii representation
with obligatory integer part (1 digit), fractional part (of
constant length), and optional exponent. Signs minus are
present if it is needed. The special cases IEEE floating
point values are represented by strings `SNaN', `QNaN',
`+Inf', `-Inf', `+0', and `-0'. The functions return value
`result'. Current round mode does not affect the resultant
ascii representation. The functions output 9 decimal fraction
digits for single precision number, 17 decimal fraction
digits for double precision number, and 36 decimal fraction
digits for quad precision number
`char *to_binary_string (int base, char *result)'
The functions are analogous to to_string but but transform
float number into to binary ascii representation with
obligatory integer part (1 digit) of given base, optional
fractional part of given base, and optional binary exponent
(decimal number giving power of 2). The binary exponent
starts with character `p' instead of `e'. Signs minus are
present if it is needed. The special cases IEEE floating
point values are represented by strings `SNaN', `QNaN',
`+Inf', `-Inf', `+0', and `-0'. The functions return value
`result'. Value of parameter base should be 2, 4, 8, or
16. Current round mode does not affect the resultant ascii
representation.
`char *from_string (const char *operand)'
skip all white spaces at the begin of source string and
transforms tail of the source string to single precision
floating point number. The number must correspond the
following syntax
['+' | '-'] [<decimal digits>]
[ '.' [<decimal digits>] ]
[ ('e' | 'E') ['+' | '-'] <decimal digits>]
or must be the following strings `SNaN', `QNaN', `+Inf',
`-Inf', `+0', or `-0'. The functions return pointer to first
character in the source string after read floating point
number. If the string does not correspond floating point
number syntax the result will be zero and functions return the
source string.
The functions can fix output exceptions as described above. Current round mode may affect resultant floating point number. It is guaranteed that transformation `IEEE floating point number -> string -> IEEE floating point number' results in the same IEEE floating point number if round to nearest mode is used. But the reverse transformation `string with 9 (or 17) digits -> IEEE floating point number -> string' may results in different digits of the fractions in ascii representation because a floating point number may represent several such strings with differences in the least significant digit. But the ascii representations are identical when the functions for IEEE single, double, and quad precision numbers do not fix imprecise result exception or less than 9 (17 or 36) digits of the fractions in the ascii representations are compared.
`char *from_binary_string (const char *operand, int base)'
The functions ar analogous to to_string but transform
binary representation of the floating point number. The
number must correspond the following syntax
['+' | '-'] [<digits less base>] [ '.' [<digits less base>] ]
[ ('p' | 'P') ['+' | '-'] <decimal digits>]
or must be the following strings `SNaN', `QNaN', `+Inf',
`-Inf', `+0', or `-0'. The functions return pointer to first
character in the source string after read floating point
number. If the string does not correspond floating point
number syntax the result will be zero and function returns the
source string. The exponent (after character `p' or `P')
defines power of two.
The functions can fix output exceptions as described above. Current round mode can affect resultant floating point number if there are too many given digits.
In class `IEEE_float'
`class IEEE_double to_double (void)'
`class IEEE_quad to_quad (void)'
`class IEEE_float &from_signed_integer
(int size, const void *integer)'
`class IEEE_float &from_unsigned_integer
(int size,
const void *unsigned_integer)'
`void to_signed_integer (int size, void *integer)'
`void to_unsigned_integer (int size,
void *unsigned_integer)'
In class `IEEE_double'
`class IEEE_float to_single (void)'
`class IEEE_quad to_quad (void)'
`class IEEE_double &from_signed_integer
(int size,
const void *integer)'
`class IEEE_double &from_unsigned_integer
(int size,
const void *unsigned_integer)'
`void to_signed_integer (int size, void *integer)'
`void to_unsigned_integer (int size,
void *unsigned_integer)'
In class `IEEE_quad'
`class IEEE_float to_single (void)'
`class IEEE_double to_double (void)'
`class IEEE_quad &from_signed_integer
(int size,
const void *integer)'
`class IEEE_quad &from_unsigned_integer
(int size,
const void *unsigned_integer)'
`void to_signed_integer (int size, void *integer)'
`void to_unsigned_integer (int size,
void *unsigned_integer)'
Actually no one output exceptions occur during
transformation of single precision floating point number to
double (or quad) precision number and of double precision
floating point number to quad precision number.
No input exceptions occur during transformation of integer
numbers to floating point numbers. Results and input
exceptions for operand of special cases values (and for
NaNs) are described for conversion floating point number to
integer by the following table
Operand | Result & Exception
--------------|-------------------
SNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
QNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
+Inf | IMax
| IEEE_INV
--------------|-------------------
-Inf | IMin
| IEEE_INV
--------------|-------------------
Others |
|
Results and input exceptions for operand of special cases
values (and for NaNs) are described for conversion floating
point number to unsigned integer by the following table
Operand | Result & Exception
--------------|-------------------
SNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
QNaN | 0
|IEEE_INV(_RO)
--------------|-------------------
+Inf | IMax
| IEEE_INV
--------------|-------------------
-Inf or | 0
negative number| IEEE_INV
--------------|-------------------
Others |
|
Results and exceptions for NaNs during transformation of
floating point numbers to (unsigned) integers are differed
from the ones for operations of addition, multiplication and
so on.
As mentioned above there are template classes `sint' and `unsint' of package `arithm'. Therefore package `IEEE' contains template functions for transformation of between IEEE numbers and integer numbers. As in package `arithm' if you define macro `NO_TEMPLATE' before inclusion of interface file, these template transformation functions will be absent. There are the following functions:
`template <int size>
class IEEE_float &IEEE_float_from_unsint
(class IEEE_float &single,
class unsint<size> &unsigned_integer)'
`template <int size>
class IEEE_float &IEEE_float_from_sint
(class IEEE_float &single,
class sint<size> &integer)
`template <int size>
void IEEE_float_to_sint (class IEEE_float &single,
class sint<size> &integer)'
`template <int size>
void IEEE_float_to_unsint
(class IEEE_float &single,
class unsint<size> &unsigned_integer)'
`template <int size>
class IEEE_double &IEEE_double_from_unsint
(class IEEE_double &single,
class unsint<size> &unsigned_integer)'
`template <int size>
class IEEE_double &IEEE_double_from_sint
(class IEEE_double &single,
class sint<size> &integer)
`template <int size>
void IEEE_double_to_sint (class IEEE_double &single,
class sint<size> &integer)'
`template <int size>
void IEEE_double_to_unsint
(class IEEE_double &single,
class unsint<size> &unsigned_integer)'
`template <int size>
class IEEE_quad &IEEE_quad_from_unsint
(class IEEE_quad &single,
class unsint<size> &unsigned_integer)'
`template <int size>
class IEEE_quad &IEEE_quad_from_sint
(class IEEE_quad &single,
class sint<size> &integer)
`template <int size>
void IEEE_quad_to_sint (class IEEE_quad &single,
class sint<size> &integer)'
`template <int size>
void IEEE_quad_to_unsint
(class IEEE_quad &single,
class unsint<size> &unsigned_integer)'
Exceptions for these functions are the same as described above
for functions `from_signed_integer', `to_signed_integer' and
so on.