The following figure shows what major algorithms MSTA can uses to generate the parsers:
1. Building -----> 4. LALR-optimization -----> 5. regular-optimization LR(k)-sets ^ | | | | | 2. Building LALR(k)-sets-------->------------| V by generalized | 6. equivalent tokens DeRemer algorithm | | | 3. Building LALR(k)-sets------->------------- by general algorithm
MSTA can generate LR(k)-parsers (see MSTA usage). After building canonical LR(k)-sets, MSTA usually makes LALR-optimization which significantly decreases size of the result parser. You can switch off LALR-optimization, but my advice is to use it. This optimization results in not only less size of the parser, but significantly speeds up MSTA work and decreases memory requirements (because before and during LALR-optimization, only LR-sets are represented only by their essential LR-situations).
LALR-optimization is to search for equivalent LR-sets with the same LR-core set (LR-set with LR-situations without contexts) and to merge them. In other words LALR-optimization is an extracting LALR-parts of grammars and implementing parsing them by adequate methods. If the input grammar describes LALR-language, the result LR-sets will be LALR-sets. The optimization algorithm is analogous to searching for minimal DFA (deterministic final automaton). There is my article describing the optimization in one russian journal (Programming, 1989), unfortunately only on russian. Now I have no time and place to describe it more detail here.
Usually MSTA generates LALR(k)-parsers with a generalized DeRemer algorithm. But when you want to see contexts of LR-situations in the description file (see MSTA usage), MSTA will use canonical algorithm of building LALR-sets (see for example old book of Aho and Ulman). Remember that this algorithm is slower than DeRemer ones in several times.
When k > 1, the length look ahead of the parser can be less than k. The parser generated by MSTA always look ahead only on minimal number of tokens necessary for correct resolution of the conflicts in given state and given input.
After building LR-sets, MSTA usually runs pass of so called regular-optimization. Unfortunately this algorithm is described by me nowhere. The idea of algorithm is to searching for all transitions of parser from the one state to another which are independent from the parser state (more correctly from the parser stack). In other words regular-optimization is an extracting regular-parts of grammars and implementing parsing them by adequate methods. As a result the generated parser will be faster and will use less the parser stack.
If the input grammar describes regular language, the result parser will not use stack at all. This permits to use MSTA for generation of effective scanner too. Moreover, scanner for a language with non-regular parts (e.g. nested comments) is described much more simply on MSTA and is effectively implemented by MSTA. To extract more regular parts a splitting LR-sets can be used (this is used for `%scanner' by default). Usage of splitting LR-sets is not recommended for usual programming languages grammars because this requires very much memory during optimizations.
Implementation of regular-optimization requires more number of classical LR-parser instructions (not only shift-reduce-error). This means that MSTA parser implementation is more oriented to compilation model than classical YACC or BISON. This also speeds up the parser generated by MSTA. The new instructions "(pop)-(shift)-goto" (optional parts are in parentheses here) are added. Moreover, more one actions for different rules can be executed during one of such instructions. Also, each parser state has status of necessity of pushing the result state and/or corresponding attribute into state and attribute stacks. MSTA parser generated with usage of the regular optimization may have several copies of rule actions, but usually this only slightly increases size of the parser.
MSTA also searches for equivalent tokens to decrease the generated parser size. This optimization is especially effective for scanners.