ExamplesΒΆ

Examples have been written with two goals in mind. First, they are practical: each example solves a distinct real-world problem, ranging from simple recognizers to complex parsers conforming to real-world standards and specifications. Second, examples show various aspects of using re2c API:

Checking for the end of input: this can be done in a number of different ways. The simplest and the most efficient way is the sentinel method demonstrated by lexing numbers example: it should be used when there is a sentinel character that never appears in the middle of well-formed input, such as the NULL character in null-terminated strings. If the input is buffered, sentinel should be appended at the end of buffer. If appending is not possible, one can emulate fake sentinel using generic API. Another, more general (but also less efficient) method is based on comparison of current input position and the end position: that is, comparison of YYCURSOR and YYLIMIT as explained in parsing strings example, or using YYLESSTHAN in case of generic API. By default, this method requires padding input with YYMAXFILL fake characters; if padding is undesirable or impossible, one can override the checking mechanism using generic API and perform checks on each input character (also used in std::ifstream example).

Handling large input: how to organize buffering and how to refill buffer with YYFILL. Some additional details of handling tags in YYFILL are illustrated in parsing URI and parsing HTTP messages.

Using storable state feature to write push-model lexers: it is necessary when the input comes in chunks that are controlled by the outside program. In such case lexer must be stopped when there is not enough input and later resumed from the same point.

Submatch extraction: using s-tags to store input positions corresponding to various parts of the regular expression in variables (outlined by parsing IPv4 address and also used in parsing /etc/passwd file format, parsing command-line options and arguments and parsing URI); using m-tags to handle repeated submatch and store repeated values efficiently in the form of a prefix tree (outlined by parsing non-recursive records and structures and also used in parsing HTTP messages).

Using generic API, either to override the input mechanism (outlined by std::ifstream example), or to tweak it (as explained in fake sentinel and strings in binaries examples).

Switching between different lexing modes using multiple interrelated sub-lexers: either in semi-automated manner with re2c conditions feature (outlined by a simple example of parsing integers and also used in parsing Braille patterns), or manually with the use of multiple blocks (outlined by another example of parsing integers and a more complex example of C++98 lexer).

Reusing the same set of rules to generate multiple lexers with different options.

Using various encodings.

All examples are written in C-90 and in C++98, so they should be quite portable. However, many examples use new re2c features and options, and they have been tested on the latest re2c version. Please report any errors and contribute new examples!