re2c   —   Regular Expressions to Code

re2c stands for Regular Expressions to Code. It is a free and open-source lexer generator that supports C/C++, D, Go, Haskell, Java, JavaScript, OCaml, Python, Rust, V, Zig, and can be extended to other languages by implementing a single syntax file. The primary focus of re2c is on generating fast code: it compiles regular expressions to deterministic finite automata and translates them into direct-coded lexers in the target language (such lexers are generally faster and easier to debug than their table-driven analogues). Secondary re2c focus is on flexibility: it does not assume a fixed program template; instead, it allows the user to embed lexers anywhere in the source code and configure them to avoid unnecessary buffering and bounds checks. Internal algorithm used by re2c is based on a special kind of deterministic finite automata: lookahead TDFA. These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead. re2c is used in other open-source projects, such as php, ninja, yasm, spamassassin, BRL-CAD, wake, etc.

man   Read the manual for C/C++, D, Go, Haskell, Java, JS, OCaml, Python, Rust, V, Zig.

play   Run examples in the playground.

feed   Subscribe to receive release notes.

Download

You can get the latest release on GitHub, as well as the older releases. Many Linux distributions and other systems provide their own packages. The source code is hosted on both GitHub (https://github.com/skvadrik/re2c) and SourceForge (https://sourceforge.net/p/re2c). GitHub serves as the main repository, bugtracker and tarball hosting. SourceForge is used as a backup repository and email hosting.

Bugs & patches

Please send bugs reports, patches and other feedback to GitHub issue tracker or email them to re2c-devel@lists.sourceforge.net and re2c-general@lists.sourceforge.net mailing lists. There is an IRC channel #re2c on irc.libera.chat and irc.oftc.net. Questions and contributions are welcome!

Papers

Authors

re2c was originally written by Peter Bumbulis (peter@csg.uwaterloo.ca) in 1993. Marcus Boerger and Dan Nuffer spent several years to turn the original idea into a production ready code generator. Since then it has been maintained and developed by multiple volunteers, most notably, Brian Young (bayoung@acm.org), Marcus Boerger, Dan Nuffer (nuffer@users.sourceforge.net), Ulya Trofimovich (skvadrik@gmail.com), Serghei Iakovlev, Sergei Trofimovich, Petr Skocik, ligfx and raekye. Many thanks to all other contributors!

License

re2c is distributed with no warranty whatever. The code is certain to contain errors. Neither authors nor contributors take any responsibility for the consequences of its use.

re2c is in the public domain. Data structures and algorithms used in re2c are all either taken from documents available to the general public or are inventions of the authors. Programs generated by re2c may be distributed freely. re2c itself may be distributed freely, in source or binary, unchanged or modified. Distributors may charge whatever fees they can obtain for re2c.

If you do make use of re2c, or incorporate it into a larger project an acknowledgement somewhere (documentation, research report, etc.) would be appreciated.

Version

This website describes re2c version 3.0.