-Wmatch-empty-string

[-Wmatch-empty-string] warns when a rule is nullable (matches an empty string). It was intended to prevent infinite looping in cases like the [hang.re] example below. The program loops over its arguments (the outer for loop) and tries to lex each argument (the inner for loop). The lexer stops when all input has been consumed and it sees the terminating NULL. Arguments must consist of lowercase letters only.

#include <stdio.h>

int main(int argc, char **argv)
{
    for (int i = 1; i < argc; ++i) {
        for (char *YYCURSOR = argv[i];;) {
        /*!re2c
            re2c:define:YYCTYPE = char;
            re2c:yyfill:enable = 0;
            "\x00" { break; }
            [a-z]* { continue; }
        */
        }
        printf("argv[%d]: %s\n", i, argv[i]);
    }
    return 0;
}

On well-formed input the program runs as expected. However, if one of the arguments contains a symbol diffrerent from lowercase letter, the program hangs forever:

$ re2c -Wmatch-empty-string hang.re -o hang.c
hang.re:11:19: warning: rule matches empty string [-Wmatch-empty-string]
$ c++ -o hang hang.c
$
$ ./hang only lowercase letters
argv[1]: only
argv[2]: lowercase
argv[3]: letters
$
$ ./hang right ?
argv[1]: right
^C

Note that if we add default rule *, the lexer won’t hang anymore: it will match the default rule instead of the nullable rule. The fix is easy: make the rule non-nullable (say, [a-z]+) and add default rule *.

In some cases matching an empty string makes perfect sense: for example, it might be used as a non-consuming default rule, or it might be used to lex an optional lexeme (if the corresponding rule doesn’t match, the lexer jumps to another block and resumes lexing at the same input position). All these cases are valid, so if [-Wmatch-empty-string] becomes annoying, it can be silenced with [-Wno-match-empty-string].