This release adds a very important step in the process of code generation: minimization of the underlying DFA (deterministic finite automaton). Simply speaking, this means that re2c now generates less code (while the generated code behaves in exactly the same way).
DFA minimization is a very well-known technique and one might expect that any self-respecting lexer generator would definitely use it. So how could it be that re2c didn’t? In fact, re2c did use a couple of self-invented tricks to compress the generated code (one interesting technique is constructing tunnel automaton). Some of these tricks were quite buggy (see this bug report for example). Now that re2c does canonical DFA minimization all this stuff is obsolete and has been dropped.
A lot of attention has been paid to the correctness of DFA minimization. Usually re2c uses a very simple criterion to validate changes: the generated code for all tests in testsuite must remain the same. However, in case of DFA minimization the generated code changes dramatically. It is impossible to verify the changes manually.
One possible verification tool is skeleton. Because skeleton is constructed prior to DFA minimization, it cannot be affected by any errors in its implementation.
Another way to verify DFA minimization is to implement two different algorithms
and compare the results. Minimization procedure has a very useful property:
the miminal DFA is unique (with respect to state relabelling).
We used Moore’s and so-called table filling algorithms:
Moore’s algorithm is fast, while table filling is very simple to implement.
There is an option
--dfa-minimization <moore | table> that allows to choose
a particular algorithm (defaults to
moore), but it’s only useful for debugging
A good side effect of messing with re2c internals is a significant speedup of code generation (see this issue for example). Test suite now runs twice as fast (at least).
See changelog for the list of all changes.