[-Wunreachable-rules]

A simple example

1
2
3
4
5
6
7
8
/*!re2c
    ""          { return ""; }
    *           { return "*"; }
    "a" | "b"   { return "a | b"; }
    "a"         { return "a"; }
    [\x00-\xFF] { return "[0 - 0xFF]"; }
    [^]         { return "[^]"; }
*/

Given this strange code, `re2c -i -Wunreachable-rules` says:

re2c: warning: line 2: unreachable rule (shadowed by rules at lines 4, 6) [-Wunreachable-rules]
re2c: warning: line 5: unreachable rule (shadowed by rule at line 4) [-Wunreachable-rules]
re2c: warning: line 7: unreachable rule (shadowed by rule at line 6) [-Wunreachable-rules]

A look at the generated code suggests that re2c was right:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/* Generated by re2c 0.14.1.dev on Fri Nov  6 15:21:36 2015*/

{
        YYCTYPE yych;
        if (YYLIMIT <= YYCURSOR) YYFILL(1);
        yych = *YYCURSOR;
        switch (yych) {
        case 'a':       goto yy5;
        case 'b':       goto yy7;
        default:        goto yy3;
        }
        { return ""; }
yy3:
        ++YYCURSOR;
        { return "[0 - 0xFF]"; }
yy5:
        ++YYCURSOR;
yy6:
        { return "a | b"; }
yy7:
        ++YYCURSOR;
        yych = *YYCURSOR;
        goto yy6;
}

Clearly, all the reported rules are unreachable (some of them are not even present in the generated code). Default rule * at line 3 is also unreachable, but re2c appreciates paranoid attempts to handle default case and never reports unreachable default rule.

Infinite rules

A rule may be unreachable all by itself, without being shadowed by other rules:

1
2
3
/*!re2c
    [^]* { return "greeedy"; }
*/

This rule is so greedy that it never stops. It will continue eating input until YYFILL finally fails (`re2c -i -Wunreachable-rules`):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
/* Generated by re2c 0.14.1.dev on Fri Nov  6 21:36:56 2015*/

{
        YYCTYPE yych;
        goto yy0;
yy1:
        ++YYCURSOR;
yy0:
        if (YYLIMIT <= YYCURSOR) YYFILL(1);
        yych = *YYCURSOR;
        goto yy1;
        { return "greeedy"; }
}

And if we suppress YYFILL generation with re2c:yyfill:enable = 0;, lexer will loop until segfault or another disaster forces its untimely shutdown (unless you use --input-api custom and do some magic to stop it). And re2c warns us:

re2c: warning: line 2: unreachable rule  [-Wunreachable-rules]

All rules that include [^]* (or any component isomorphic to [^]*) are also infinite.

How it works

For each state of the generated DFA re2c calculates the set of all rules reachable from this state (including default rule and no rule at all). The algorithm starts with initial state and recurses to child states. Recursion stops if either current state is final (then its set is trivial) or the set for current state has already been calculated. Thus each state is processed only once and the algorithm has O(n) complexity (where n is the number of states in DFA). When all the sets have been calculated, re2c checks that the set of rules reachable from each accepting state includes either accepted rule or no rule at all.

Analyses is done regardless of the -Wunreachable-rules option, the option only controls if the warning is reported or not.

Real-world examples

In many real-world examples re2c reports unreachable [^] rule, which is quite understandable: [^] served as default rule long before the true default rule * was added to re2c. However, in some cases the warning is quite interesting. Here is an example of a real-world lexer (all the non-re2c code has beed removed):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/*!re2c
    re2c:yyfill:check = 0;

    LNUM                [0-9]+
    DNUM                ([0-9]*[\.][0-9]+)|([0-9]+[\.][0-9]*)
    NUMBER              [-]?{LNUM}|{DNUM}
    ANY_CHAR            (.|[\n\t])
    NEWLINE             ("\r"|"\n"|"\r\n")
    TABS_AND_SPACES     [ \t]
    WHITESPACE          [ \t]+
    CONSTANT            [a-zA-Z_][a-zA-Z0-9_]*
    LABEL               [^=\n\r\t;&|^$~(){}!"\[]+
    TOKENS              [:,.\[\]"'()&|^+-/*=%$!~<>?@{}]
    OPERATORS           [&|^~()!]
    DOLLAR_CURLY        "${"
    SECTION_RAW_CHARS   [^\]\n\r]
    SINGLE_QUOTED_CHARS [^']
    RAW_VALUE_CHARS     [^\n\r;\000]
    LITERAL_DOLLAR      ("$"([^{\000]|("\\"{ANY_CHAR})))
    VALUE_CHARS         ([^$= \t\n\r;&|^~()!"'\000]|{LITERAL_DOLLAR})
    SECTION_VALUE_CHARS ([^$\n\r;"'\]\\]|("\\"{ANY_CHAR})|{LITERAL_DOLLAR})

    <INITIAL>"[" {}
    <ST_VALUE,ST_SECTION_VALUE,ST_OFFSET>"'"{SINGLE_QUOTED_CHARS}+"'" {}
    <ST_SECTION_RAW,ST_SECTION_VALUE>"]"{TABS_AND_SPACES}*{NEWLINE}? {}
    <INITIAL>{LABEL}"["{TABS_AND_SPACES}* {}
    <ST_OFFSET>{TABS_AND_SPACES}*"]" {}
    <ST_DOUBLE_QUOTES,ST_SECTION_VALUE,ST_VALUE,ST_OFFSET>{DOLLAR_CURLY} {}
    <ST_VARNAME>{LABEL} {}
    <ST_VARNAME>"}" {}
    <INITIAL,ST_VALUE>("true"|"on"|"yes"){TABS_AND_SPACES}* {}
    <INITIAL,ST_VALUE>("false"|"off"|"no"|"none"){TABS_AND_SPACES}* {}
    <INITIAL,ST_VALUE>("null"){TABS_AND_SPACES}* {}
    <INITIAL>{LABEL} {}
    <INITIAL>{TABS_AND_SPACES}*[=]{TABS_AND_SPACES}* {}
    <ST_RAW>{RAW_VALUE_CHARS} {}
    <ST_SECTION_RAW>{SECTION_RAW_CHARS}+ {}
    <ST_VALUE,ST_RAW>{TABS_AND_SPACES}*{NEWLINE} {}
    <ST_SECTION_VALUE,ST_VALUE,ST_OFFSET>{CONSTANT} {}
    <ST_SECTION_VALUE,ST_VALUE,ST_OFFSET>{NUMBER} {}
    <INITIAL>{TOKENS} {}
    <ST_VALUE>{OPERATORS}{TABS_AND_SPACES}* {}
    <ST_VALUE>[=] {}
    <ST_VALUE>{VALUE_CHARS}+ {}
    <ST_SECTION_VALUE,ST_OFFSET>{SECTION_VALUE_CHARS}+ {}
    <ST_SECTION_VALUE,ST_VALUE,ST_OFFSET>{TABS_AND_SPACES}*["] {}
    <ST_DOUBLE_QUOTES>["]{TABS_AND_SPACES}* {}
    <ST_DOUBLE_QUOTES>[^] {}
    <ST_SECTION_VALUE,ST_VALUE,ST_OFFSET>{WHITESPACE} {}
    <INITIAL,ST_RAW>{TABS_AND_SPACES}+ {}
    <INITIAL>{TABS_AND_SPACES}*{NEWLINE} {}
    <INITIAL,ST_VALUE,ST_RAW>{TABS_AND_SPACES}*[;][^\r\n]*{NEWLINE} {}
    <ST_VALUE,ST_RAW>[^] {}
    <*>[^] {}
*/

`re2c -cF -Wunreachable-rules` says:

re2c: warning: line 54: unreachable rule in condition 'ST_DOUBLE_QUOTES' (shadowed by rules at lines 47, 48) [-Wunreachable-rules]
re2c: warning: line 49: unreachable rule in condition 'ST_OFFSET' (shadowed by rule at line 45) [-Wunreachable-rules]
re2c: warning: line 54: unreachable rule in condition 'ST_RAW' (shadowed by rules at lines 36, 38, 53) [-Wunreachable-rules]
re2c: warning: line 49: unreachable rule in condition 'ST_SECTION_VALUE' (shadowed by rule at line 45) [-Wunreachable-rules]
re2c: warning: line 54: unreachable rule in condition 'ST_VALUE' (shadowed by rules at lines 38, 39, 40, 42, 43, 44, 46, 49, 53) [-Wunreachable-rules]

The interesting part is the unreachable rule on line 49 in conditions ST_OFFSET and ST_SECTION_VALUE. The rule is {WHITESPACE}:

WHITESPACE [ \t]+

re2c claims that it is shadowed by the rule on line 45, which is {SECTION_VALUE_CHARS}+:

ANY_CHAR            (.|[\n\t])
LITERAL_DOLLAR      ("$"([^{\000]|("\\"{ANY_CHAR})))
SECTION_VALUE_CHARS ([^$\n\r;"'\]\\]|("\\"{ANY_CHAR})|{LITERAL_DOLLAR})

Indeed, {SECTION_VALUE_CHARS}+ allows all the patterns accepted by {WHITESPACE}. In the original program these rules return different types of tokens: perhaps this is not critical, but clearly unintended.