re2c — Regular Expressions to Code¶
re2c stands for Regular Expressions to Code. It is a free and open-source lexer generator that supports C/C++, D, Go, Haskell, Java, JavaScript, OCaml, Python, Rust, V, Zig, and can be extended to other languages by implementing a single syntax file. The primary focus of re2c is on generating fast code: it compiles regular expressions to deterministic finite automata and translates them into direct-coded lexers in the target language (such lexers are generally faster and easier to debug than their table-driven analogues). Secondary re2c focus is on flexibility: it does not assume a fixed program template; instead, it allows the user to embed lexers anywhere in the source code and configure them to avoid unnecessary buffering and bounds checks. Internal algorithm used by re2c is based on a special kind of deterministic finite automata: lookahead TDFA. These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead. re2c is used in other open-source projects, such as php, ninja, yasm, spamassassin, BRL-CAD, wake, etc.
Read the manual for C/C++, D, Go, Haskell, Java, JS, OCaml, Python, Rust V, Zig.
Run examples in the playground.
Subscribe to receive release notes.
Download¶
You can get the latest release on GitHub, as well as the older releases. Many Linux distributions and other systems provide their own packages. The source code is hosted on both GitHub (https://github.com/skvadrik/re2c) and SourceForge (https://sourceforge.net/p/re2c). GitHub serves as the main repository, bugtracker and tarball hosting. SourceForge is used as a backup repository and email hosting.
Bugs & patches¶
Please send bugs reports, patches and other feedback to GitHub issue tracker or email them to
re2c-devel@lists.sourceforge.net and
re2c-general@lists.sourceforge.net
mailing lists. There is an IRC channel #re2c
on
irc.libera.chat and
irc.oftc.net. Questions and contributions are
welcome!
Papers¶
2022 A closer look at TDFA by Angelo Borsotti and Ulya Trofimovich. arXiv:2206.01398 [pdf 2022]
2020 RE2C: A lexer generator based on lookahead-TDFA by Ulya Trofimovich. Software Impacts 6 (2020) 100027, [pdf 2021]
2019 Efficient POSIX submatch extraction on NFA by Angelo Borsotti and Ulya Trofimovich. Software: Practice and Experience 51, 2, pp. 159–192 [pdf 2019]
2017 Tagged Deterministic Finite Automata with Lookahead by Ulya Trofimovich. arXiv:1907.08837, [pdf 2017]
1994 RE2C: a more versatile scanner generator by Peter Bumbulis and Donald D. Cowan. ACM Letters on Programming Languages and Systems (LOPLAS) [ps 1994]
License¶
re2c is distributed with no warranty whatever. The code is certain to contain errors. Neither authors nor contributors take any responsibility for the consequences of its use.
re2c is in the public domain. Data structures and algorithms used in re2c are all either taken from documents available to the general public or are inventions of the authors. Programs generated by re2c may be distributed freely. re2c itself may be distributed freely, in source or binary, unchanged or modified. Distributors may charge whatever fees they can obtain for re2c.
If you do make use of re2c, or incorporate it into a larger project an acknowledgement somewhere (documentation, research report, etc.) would be appreciated.
Version¶
This website describes re2c version 3.0.