SPRUT (internal representation description translator): Internal representation description language

2. Internal representation description language

IRD declares types of nodes of the graph. Nodes contains fields, part of them represents links between nodes, and another part of them stores attributes of arbitrary types. To make easy describing internal representation the IRD supports explicitly single inheritance in node types and also can model multiple inheritance. There can be several levels of internal representation description in separate files. The nodes of one level refer to the nodes of previous levels. Therefore each next level enriches source program internal representation.

2.1 Layout of internal representation description

To describe internal representation a special language is used. An internal representation description structure has the following layout which is similar to one of YACC file.


     DECLARATIONS
     %%
     TYPES OF NODES
     %%
     ADDITIONAL C/C++ CODE

The `%%' serves to separate the sections of description. All sections are optional. The first `%%' starts section of description of types of internal representation nodes and is obligatory even if the section is empty, the second `%%' may be absent if section of additional C/C++ code is absent too.

The section of declarations may contain names of predefined types of fields of internal representation nodes and names of types of double linked nodes. The section also contains name of original internal representation description if given file contains extension of an internal representation description. And finally the section may contains sections of code on C/C++.

The next section contains description of types of internal representation nodes.

The additional C/C++ code can contain any C/C++ code you want to use. Often functions which are not generated by the translator but are needed to work with internal representation go here. This code without changes is placed at the end of file generated by the translator.

2.2 Declarations

The section of declarations may contain the following construction.


     %type IDENTIFIER ...

All predefined types must be defined in constructions of such kind. The same name can be defined repeatedly. All references to node whose type is a sub-type of type with name present in the following construction will be double linked.


     %double IDENTIFIER ...

It means that SPI will generates functions (macros) which permit to examine all fields (may be in other nodes) described as of given node type (or its subtype) which refer to node of given type (or its sub-type), i.e. fields which are described as of given node type (or its subtype) will be double linked (see below). SPI will automatically maintain such double links. The simplest way to describe double linked graph is to insert construction


     %double %root

Last construction in the section of declarations of kind


     %extend IDENTIFIER

defines name (without suffix) of file containing source internal representation which is extended by given file. The file contains original internal representation if there is no one such construction. The original specification file has level 0, the extensions have level 1, 2, and so on. This feature permits sequentially to develop an internal representation and to save and restore any its level (see SPI) to additional tools, e.g. browsers. For example, there may be three levels of source program internal representation. The zero level representation may be internal representation for semantic analysis, the first level may be a low level machine-dependent internal representation, and the second level may be used to generate object code. Only first such construction in one file is essential. All subsequent such constructions are ignored.

There may be also the following constructions in the declaration section


     %local {
        C/C++ DECLARATIONS
     }

     %import {
        C/C++ DECLARATION
     }

         and

     %export {
        C/C++ DECLARATION
     }

which contain any C/C++ declarations (types, variables, macros, and so on) used in the description sections.

The local C/C++ declarations are inserted at the begin of generated implementation file (see SPI description) but after include-directive of interface file.

C/C++ declarations which start with `%import' are inserted at the begin of generated interface file. For example, such C/C++ code may contain C/C++ definitions of predefined types which are used in field declarations of node types.

C/C++ declarations which start with `%export' are inserted at the end of generated interface file. For example, such C/C++ code may contain definitions of external variables and functions which refer to node type representation (see type `IR_node_t' in SPI description).

All C/C++ declarations can redefine all type specific and internal macros (see SPI) because they are placed in implementation file. All C/C++ declarations are placed in the same order as in the section of declarations. C/C++ declarations from IRD file with smaller level number (see construction `%extend') are placed in interface or implementation files firstly.

2.3 Types of nodes

The section of declarations is followed by section defining internal representation node types. An internal node type is described by the following construction


     %abstract
     IDENTIFIER :: IDENTIFIER (or %root)
     CLASS FIELDS
     SKELETON FIELDS
     OTHER FIELDS

Keywords `%abstract' is optional. Node type description which starts with this keyword denotes abstract node type, i.e. node of such type does not exist in internal representation of any source program. Abstract nodes types serve only to description of common fields of several node types.

The first identifier defines name of internal representation node type, the second defines name of node type all declarations of fields of which are inherited into the given node type. In this construction the first node type is so called immediate super type, the second is immediate sub-type.

Node type A is a super-type of node type B (and node type B is a sub-type of node type A) iff node type A is immediate super-type of super-type of node type B. All node types are sub-types of implicitly declared node type with name `%root'. There are also nodes of special type (error nodes). Type of these nodes are believed to be sub-type of all declared node types. The definition of type of error nodes are absent in any internal representation description.

The identifier of immediate super-type (with `::') can be absent. In this case the construction is continuation of given type node declaration. There can be only the single main node type declaration and many its continuations. The order of main node type declaration and its continuations can be arbitrary. The continuations can not start with keyword `%abstract'.

Construction of the following kind


     A, B, ... :: C
     DECLARATIONS OF FIELDS

is abbreviation of the following constructions


     A :: C
     DECLARATIONS OF FIELDS
     B :: C
     DECLARATIONS OF FIELDS
     ...

The fields are sub-divided on kinds. There are three kinds of fields.

Class fields. There is the single instance of a class field for all nodes of given type.
Skeleton fields. There is the single instance of a skeleton fields for each node of given type. The value of skeleton field is to be given by user at the node creation moment (see SPI description).
Other fields. This kind of fields is analogous to one of skeleton fields but value of such field is not given by user at the node creation moment.

Optional sections of declarations of class fields, skeleton fields, and other fields start correspondingly with keywords `%class', `%skeleton', and `%other'.


     %class LIST OF DECLARATIONS OF FIELDS

     %skeleton LIST OF DECLARATIONS OF FIELDS

     %other LIST OF DECLARATIONS OF FIELDS

These keywords are followed by may be empty list of declarations of fields. The list elements are field declaration or target code. Field declaration is described by the following construction:


     IDENTIFIER : FIELD  %double  TYPE  CONSTRAINTS  ACTIONS

Identifier is name of given field. All declarations of fields of a node type must have unique names. But declarations of fields of different node types can have the same names if the types of such fields are the same, the fields are simultaneously described as double linked or not, and all such fields are class or any non-class fields. Owing to the feature it is possible to model multiple inheritance. Semicolon `:' is followed by the field type. There are the following types of nodes fields:

identifier from a clause `%type'. The identifier represents predefined type and must be declared anywhere by the construction `typedef' of C/C++.
node name. This type represents arc to a node of given type or its subtypes and is implemented by pointer of C/C++.

These constructions can have optional clause `%double'. Its sense is slightly different from the one in the declarations section. This clause means that SPI will generate functions (macros) which permits to examine all nodes of the declared type (or its sub-type) which refer through given field to a node of the field type (or its sub-type). The clause `%double' can not be given for class fields.

The field type may be followed by constraints and actions in any order. The constraint is usually present for non-null value node reference. The constraint is a boolean expression on C/C++ in brackets `[' and `]'. The constraints are tested with the aid of some generated functions in the same order as they are present in corresponding type node declaration. The actions can contain any statements on C/C++ in figure brackets `{' and `}' (the brackets are also output therefore C/C++ declarations can be in actions). The actions for skeleton and other fields are fulfilled at the node creation time in the same order as they are present in corresponding type node declaration. The actions for class fields are fulfilled at the internal representation initiation time.

Constructions `$$', `$' in the constraints and the actions represent correspondingly current node and previous field. But `$' is not changed by the previous field if the previous field in given node type declaration is of other kind than the constraint or action (e.g. the field of skeleton kind and the constraint of class kind) or such field does not exist in the current node type declaration. Also it should be remembered that `$' can not be used in left hand side of assignment if the construction represent double linked fields.

The construction `IDENTIFIER : FIELD TYPE' may be absent. This case is convenient for definition of additional constraints and actions for fields which declared in a super-type of given node type.

Construction of kind


     A, B, ... : C ...

is abbreviation of the following constructions


     A : C ...
     B : C ...

Full YACC syntax of internal representation description language is placed in Appendix 1.

Next Previous Contents