MSTA description languages is superset of YACC language. The major additional features are Extended Backus Naur Form (EBNF) for more convenient descriptions of languages, additional constructions in rules for more convenient description of scanners, and named attributes.
MSTA description structure has the following layout which is similar to one of YACC file.
DECLARATIONS
%%
RULES
%%
ADDITIONAL C/C++ CODE
The `%%' serves to separate the sections of description. All sections
are optional. The first `%%' starts section of keywords and is
obligatory even if the section is empty, the second `%%' may be absent
if section of additional C/C++ code is absent too.
Full YACC syntax of MSTA description file is placed in Appendix 1.
The section of declarations may contain the following construction:
%start identifier
which determines axiom of the grammar. If such construction is
absent, the axiom is believed to be nonterminal in the left hand side
of the first rule. If there are several such construction, all ones
except for the first are ignored.
By default, the values of attributes of the terminals (tokens) and nonterminals shall be integers. If you are going to use the values of different types, you shall use
<tag>
in constructs declaring symbols (%token, %type, %left, ...) and shall
insert corresponding union member names in the following construction:
%union { body of union in C/C++ }
Alternatively, the union can be declared in interface file, and a
typedef used to define the symbol YYSTYPE (see generated code) to
represent this union. The effect of %union is to provide the
declaration of YYSTYPE directly from the input.
There is group of the following declarators which take token (terminal) or nonterminal names as arguments.
%token [<tag>] name [number] [name [number]]...
%left [<tag>] name [number] [name [number]]...
%right [<tag>] name [number] [name [number]]...
%nonassoc [<tag>] name [number] [name [number]]...
%type <tag> name...
The names can optionally be preceded by the name of a C/C++ union
member (called a tag see above) appearing within ``<'' and ``>''. The
use of tag specifies that the tokens or nonterminals named in this
construction are to be of the same C/C++ type as the union member
referenced by the tag.
If symbol used in grammar is undefined by a %token, %left, %right, or %nonassoc declaration, the symbol will be considered as a nonterminal.
The first occurrence of a given token can be followed by a positive integer in constructions `%token', `%left', `%right', and `%nonassoc' defining tokens. In this case the value assigned to it shall be code of the corresponding token returned by scanner.
Constructions `%left', `%right', and `%nonassoc' assign precedence and to the corresponding tokens. All tokens in the same construction have the same precedence level and associativity; the constructions is suggested to be placed in order of increasing precedence. Construction `%left' denotes that the operators (tokens) in that construction are left associative, and construction `%right' similarly denotes right associative operators.
Construction `%nonassoc' means that tokens cannot be used associatively. If the parser encounters associative use of this token it will report an error.
The construction `%type' means that the attributes of the corresponding nonterminals are of type given in the tag field.
Once the type, precedence, or token number of a symbol is specified, it shall not be changed. If the first declaration of a token does not assign a token number, MSTA will assign a token number. Once this assignment is made, the token number shall not be changed by explicit assignment.
Usually real grammars can not be declared without shift/reduce conflicts. To control suggested number of shift/reduce conflicts, the following construction can be used.
%expect number
If such construction is present, MSTA will report error if the number
of shift/reduce conflicts is not the same as one in the construction.
Remember that it is not standard YACC construction.
The following construction in declarations means that the scanner should be generated.
%scanner
There are the following major differences in parser and scanner
generated by MSTA
There may be also the following constructions in the declaration section
%{
C/C++ DECLARATIONS
%}
%local {
C/C++ DECLARATIONS
}
%import {
C/C++ DECLARATION
}
and
%export {
C/C++ DECLARATION
}
which contain any C/C++ declarations (types, variables, macros, and so
on) used in sections. Remember the only first construction is
standard POSIX YACC construction.
The local C/C++ declarations are inserted at the begin of generated implementation file (see section `generated code') but after include-directive of interface file (if present -- see MSTA Usage). You also can use more traditional construction of YACC %{ ... %} instead.
C/C++ declarations which start with `%import' are inserted at the begin of generated interface file. If the interface file is not generated, the code is inserted at the begin of the part of implementation file which would correspond the interface file.
C/C++ declarations which start with `%export' are inserted at the end of generated interface file. For example, such exported C/C++ code may contain definitions of external variables and functions which refer to definitions generated by MSTA. If the interface file is not generated, the code is inserted at the end of the part of implementation file which would correspond the interface file.
All C/C++ declarations are placed in the same order as in the section of declarations.
The section of declarations is followed by section of rules.
The rules section defines the context-free grammar to be accepted by the function yacc generates, and associates with those rules C language actions and additional precedence information. The grammar is described below, and a formal definition follows.
The rules section contains one or more grammar rules. A grammar rule has the following form:
nonterminal : pattern ;
The nonterminal in the left side hand of the rule describes a language
construction and pattern into which the nonterminal is derivated. The
semicolon at the end of the rule can be absent.
MSTA can use EBNF (Extended Backus-Naur Form) to describe the patterns. Because the pattern can be quite complex, MSTA internally transforms rules in the description into simple rules and assigns a unique number to each simple rule. Simple rule can contains only sequence of nonterminals and tokens. Simple rules and the numbers assigned to the rules appear in the description file (see MSTA usage). To achieve to the simple rules, MSTA makes the following transformations (in the same order).
nonterminal : pattern1 | pattern2
are transformed into
nonterminal : pattern1
nonterminal : pattern2
nonterminal : ... pattern / s_pattern ...
are transformed into
nonterminal : ... N ...
N : N s_patter pattern
N denotes here a new nonterminal created during the
transformation. This construction is very convenient for
description of lists with separators, e.g. identifier separated
by commas. Remember that the lists are not feature of standard
POSIX YACC.
nonterminal : ... N @ identifier ...
is transformed into
nonterminal : ... N ...
Here N denotes a nonterminal, a token, or the following
constructions. Instead of number in actions, the identifier
can be used for naming attributes of the nonterminal, the
token, or nonterminal which is created during transformation of
the following constructions. Remember that the naming is not
feature of standard POSIX YACC.
nonterminal : ... [ pattern ] ...
is transformed into
nonterminal : ... N ...
N : pattern
N :
N denotes here a new nonterminal created during the
transformation. This construction is very convenient for
description of optional constructions. Remember that the
optional construction is not feature of standard POSIX YACC.
nonterminal : ... pattern * ...
is transformed into
nonterminal : ... N ...
N : N pattern
N :
N denotes here a new nonterminal created during the
transformation. This construction is very convenient for
description of zero or more the patterns. Remember that the
optional repetition is not feature of standard POSIX YACC.
nonterminal : ... pattern + ...
is transformed into
nonterminal : ... N ...
N : N pattern
N : pattern
N denotes here a new nonterminal created during the
transformation. This construction is very convenient for
description of one or more the patterns. Remember that the
repetition is not feature of standard POSIX YACC.
nonterminal : ... ( pattern ) ...
is transformed into
nonterminal : ... N ...
N : pattern
N denotes here a new nonterminal created during the
transformation. This construction is necessary to change
priority of the transformations. Remember that the grouping is
not feature of standard POSIX YACC.
nonterminal : ... string ...
is transformed into
nonterminal : ... '1st char' '2nd char' ... 'last char' ...
Here the string is simply sequence of string characters as MSTA
literals. Remember that the strings are not standard feature
of POSIX YACC.
nonterminal : ... token1 - tokenN ...
is transformed into
nonterminal : N
N : token1
N : token2
...
N : tokenN
N denotes here a new nonterminal created during the
transformation. The range is simply any token with code
between code of token1 and code of token2 (inclusively). The
code of token1 must be less or equal to the code of token2.
Remember that the ranges are not feature of standard POSIX
YACC.
nonterminal : ... token1 <- tokenN ...
is transformed into
nonterminal : N
N : token2
N : token3
...
N : tokenN
N denotes here a new nonterminal created during the
transformation. The left open range is simply any token with
code between code of token1 + 1 and code of token2
(inclusively). The code of token1 must be less to the code of
token2. Remember that the ranges are not feature of standard
POSIX YACC.
nonterminal : ... token1 -> tokenN ...
is transformed into
nonterminal : N
N : token1
N : token2
...
N : tokenN-1
N denotes here a new nonterminal created during the
transformation. The right open range is simply any token with
code between code of token1 and code of token2 - 1
(inclusively). The code of token1 must be less to the code of
token2. Remember that the ranges are not feature of standard
POSIX YACC.
nonterminal : ... token1 <-> tokenN ...
is transformed into
nonterminal : N
N : token2
N : token3
...
N : tokenN-1
N denotes here a new nonterminal created during the
transformation. The left right open range is simply any token
with code between code of token1 + 1 and code of token2 - 1
(inclusively). The code of token1 must be less to the code of
token2 - 1. Remember that the ranges are not feature of
standard POSIX YACC.
nonterminal : ... action something non empty
is transformed into
nonterminal : ... N something non empty
N : action
N denotes here a new nonterminal created during the
transformation. The action is a C/C++ block.The action is an arbitrary C/C++ block, i.e. declarations and statements enclosed in curly braces { and }. Certain pseudo-variables can be used in the action for attribute references. These are changed by data structures known internally to MSTA. The pseudo-variables have the following forms:
This pseudo-variable denotes the nonterminal in the left hand side of the simple rule.
This pseudo-variable refers to the attribute of sequence element (nonterminal, token, or action) specified by its number in the right side of the rule before changing actions inside pattern (see transformation above), reading from left to right. The number can be zero or negative. If it is, it refers to the attribute of the symbol (token or nonterminal) on the parser's stack preceding the leftmost symbol of the rule. (That is, $0 refers to the attribute of the symbol immediately preceding the leftmost symbol in the rule, to be found on the parser's stack, and $-1 refers to the symbol to its left.) If number refers to an element past the current point in the rule (i.e. past the action), or beyond the bottom of the stack, the result is undefined.
These pseudo-variable is analogous to the previous one but the attribute name is used instead of its number. Of course the attribute naming must exist.
This pseudo-variable is used when there are attributes of different types in the grammar and the number corresponds to the nonterminal whose type is not known because the nonterminal has been generated during the transformation of rules into the simple rules. The type name of the attribute is placed into angle braces.
These pseudo-variable is analogous to the previous one but the attribute name is used instead of its number. Of course the attribute naming must exist.
This pseudo-variable is used when there are attributes of different types in the grammar and the type of nonterminal is not known because the nonterminal has been generated during the transformation of rules into the simple rules.
The optional construction `%prec ...' can be used to change the precedence level associated with a particular simple rule. Examples of this are in cases where a unary and binary operator have the same symbolic representation, but need to be given different precedences. The reserved keyword `%prec' can be followed by a token identifier or a literal. It shall cause the precedence of the grammar rule to become that of the following token identifier or literal.
The optional construction `%la number' can be used to change the maximal look ahead associated with a particular simple rule. Example of this is when there is a classical conflict if-then-else which is to be resolved correctly with look ahead equal to 1 and there is a rule with conflict which must be resolved with look ahead equal to 3. In this case you can call MSTA with maximal look ahead equal to 1 (this is default) and place %la 3 in the rule which takes part in the conflict which must be resolved with look ahead equal to 3.
If a program section follows, the grammar rules shall be terminated by %%.