Fake sentinelΒΆ

This example explores the case when we know the length of input, but there is no terminating character and buffering is not possible. In such cases we cannot use the usual sentinel method; and we cannot use YYLIMIT-based method as it requires YYMAXFILL padding. The choiche then is to use generic API: disable the default cheching mechanism with re2c:yyfill:enable = 0; and use one of the primitives YYPEEK and YYSKIP to check for the end of input.

In this example we use YYPEEK to emulate fake sentinel: every time the lexer peeks a new character, it first checks for the end of input: if it has already been reached, YYPEEK returns NULL (though the actual string has no terminating NULL). Checking on every YYPEEK is less efficient than the usual sentinel method (which performs no checking at all), but it can be more efficient than copying input to buffer and padding it with a real sentinel character.

Note that fake sentinel method also relies on the fact that sentinel cannot appear in the middle of well-formed input. If the input can contain arbitrary characters, then one should utilize YYSKIP as shown in this example.

[fake_sentinel.re]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <stdio.h>
#include <string.h>

static int lex(const char *cur, const char *lim)
{
    const char *mar, *tok = cur;
#   define YYCTYPE     char
#   define YYPEEK()    (cur < lim ? *cur : 0)
#   define YYSKIP()    ++cur
#   define YYBACKUP()  mar = cur
#   define YYRESTORE() cur = mar
    /*!re2c
        re2c:yyfill:enable = 0;

        * { printf("error\n"); return 1; }
        [0-9a-zA-Z]+ [;] [\x00] {
            printf("%.*s\n", (int) (cur - tok) - 1, tok);
            return 0;
        }
    */
}

int main(int argc, char **argv)
{
    if (argc != 2) return 1;

    char *s = argv[1];
    size_t l = strlen(s);
    s[l] = ';'; // overwrite terminating NULL
    return lex(s, s + l + 1);
}

Compile:

$ re2c --input custom -o fake_sentinel.cc fake_sentinel.re
$ g++ -o fake_sentinel fake_sentinel.cc

Run:

$ ./fake_sentinel somestring
somestring;