The following figure shows what major algorithms MSTA can uses to generate the parsers:
1. Building > 4. LALRoptimization > 5. regularoptimization
LR(k)sets ^ 
 
 
2. Building LALR(k)sets> V
by generalized  6. equivalent tokens
DeRemer algorithm 


3. Building LALR(k)sets>
by general algorithm
MSTA can generate LR(k)parsers (see MSTA usage). After building canonical LR(k)sets, MSTA usually makes LALRoptimization which significantly decreases size of the result parser. You can switch off LALRoptimization, but my advice is to use it. This optimization results in not only less size of the parser, but significantly speeds up MSTA work and decreases memory requirements (because before and during LALRoptimization, only LRsets are represented only by their essential LRsituations).
LALRoptimization is to search for equivalent LRsets with the same LRcore set (LRset with LRsituations without contexts) and to merge them. In other words LALRoptimization is an extracting LALRparts of grammars and implementing parsing them by adequate methods. If the input grammar describes LALRlanguage, the result LRsets will be LALRsets. The optimization algorithm is analogous to searching for minimal DFA (deterministic final automaton). There is my article describing the optimization in one russian journal (Programming, 1989), unfortunately only on russian. Now I have no time and place to describe it more detail here.
Usually MSTA generates LALR(k)parsers with a generalized DeRemer algorithm. But when you want to see contexts of LRsituations in the description file (see MSTA usage), MSTA will use canonical algorithm of building LALRsets (see for example old book of Aho and Ulman). Remember that this algorithm is slower than DeRemer ones in several times.
When k > 1, the length look ahead of the parser can be less than k. The parser generated by MSTA always look ahead only on minimal number of tokens necessary for correct resolution of the conflicts in given state and given input.
After building LRsets, MSTA usually runs pass of so called regularoptimization. Unfortunately this algorithm is described by me nowhere. The idea of algorithm is to searching for all transitions of parser from the one state to another which are independent from the parser state (more correctly from the parser stack). In other words regularoptimization is an extracting regularparts of grammars and implementing parsing them by adequate methods. As a result the generated parser will be faster and will use less the parser stack.
If the input grammar describes regular language, the result parser will not use stack at all. This permits to use MSTA for generation of effective scanner too. Moreover, scanner for a language with nonregular parts (e.g. nested comments) is described much more simply on MSTA and is effectively implemented by MSTA. To extract more regular parts a splitting LRsets can be used (this is used for `%scanner' by default). Usage of splitting LRsets is not recommended for usual programming languages grammars because this requires very much memory during optimizations.
Implementation of regularoptimization requires more number of classical LRparser instructions (not only shiftreduceerror). This means that MSTA parser implementation is more oriented to compilation model than classical YACC or BISON. This also speeds up the parser generated by MSTA. The new instructions "(pop)(shift)goto" (optional parts are in parentheses here) are added. Moreover, more one actions for different rules can be executed during one of such instructions. Also, each parser state has status of necessity of pushing the result state and/or corresponding attribute into state and attribute stacks. MSTA parser generated with usage of the regular optimization may have several copies of rule actions, but usually this only slightly increases size of the parser.
MSTA also searches for equivalent tokens to decrease the generated parser size. This optimization is especially effective for scanners.