This is a comparison of regular expression engines.
Libraries
List of regular expression librariesName | Official website | Programming language | Software license | Used by |
---|---|---|---|---|
Boost.Regex1 | Boost C++ Libraries | C++ | Boost | Notepad++ >= 6.0.0, EmEditor |
Boost.Xpressive | Boost C++ Libraries | C++ | Boost | |
DEELX | RegExLab | C++ | Proprietary | |
FREJ2 | Fuzzy Regular Expressions for Java | Java | LGPL | |
GLib/GRegex3 | GLib reference manual | C | LGPL | |
GNU regex | Gnulib reference manual | C | LGPL | GNU libc, GNU programs |
GRETA | Microsoft Research | C++ | Proprietary | |
Gregex | Grovf Inc. | RTL, HLS | Proprietary | FPGA accelerated >100 Gbit/s regex engine for cybersecurity, financial, e-commerce industries. |
Hyperscan | Intel | C, x86-specific assembly (SSSE3+4) | 3-clause BSD | Rspamd |
ICU | International Components for Unicode | C, C++5 | ICU | Foundation (Apple and Swift open-source versions) |
Jakarta Regexp | The Apache Jakarta Project | Java | Apache | |
java.util.regex | Java's User manual | Java | GNU GPLv2 with Classpath exception | jEdit |
JRegex | JRegex | Java | BSD | |
MATLAB | Regular Expressions | MATLAB Language | Proprietary | |
Oniguruma | Kosako | C | BSD | Atom, Take Command Console, Tera Term, TextMate, Sublime Text, SubEthaEdit, EmEditor, jq, Ruby |
Pattwo | Stevesoft | Java (compatible with Java 1.0) | LGPL | |
PCRE | pcre.org | C, C++6 | BSD | Apache HTTP Server, Nginx, BBEdit, Edbrowse, Julia, HHVM, Notepad++ < 6.0.0, PHP, Delphi, R, Exim, SWI-Prolog, Elixir, Erlang |
Qt/QRegExp | Digia Archived 2013-12-12 at the Wayback Machine | C++ | Qt GNU GPL v. 3.0, | Kate, Kile |
regex - Henry Spencer's regular expression libraries | ArgList | C | BSD | |
RE2 | RE2 | C++ | BSD | Go, Google Sheets, Gmail, G Suite |
Henry Spencer's Advanced Regular Expressions | Tcl | C | BSD | |
RGX | RGX | C++ based component library | P6R | |
RXP | Titan IC | RTL | Proprietary | hardware-accelerated search acceleration using RegEx available for ASIC, FPGA and cloud. Enables massively parallel content processing at ultra-high speeds. |
SubReg | Matt Bucknall | C | MIT | |
TPerlRegEx | TPerlRegEx VCL Component | Object Pascal | MPLv1.1 | |
TRE7 | Ville Laurikari | C | BSD | musl |
TRegExpr | TRegExpr, documentation, | Object Pascal | Dual-license: freeware, or LGPL with static linking exception | Total Commander |
Wolfram Language (Mathematica) | Wolfram Language Documentation Center | Wolfram Language | Proprietary | Mathematica, the Wolfram Development Platform |
XRegExp | XRegExp | JavaScript | MIT |
Languages
List of languages and frameworks including regular expression supportLanguage | Official website | Software license | Remarks |
---|---|---|---|
ActionScript 3 | ActionScript Technology Center | Free | |
APL (APLX, Dyalog, GNU) | APL Wiki | Licensed by the respective implementation | ⎕SS (PCRE), ⎕R/⎕S (PCRE), ⎕SS (PCRE2), respectively |
C++11 (C++) | C++ standards website | Licensed by the respective implementation | Since ISO14822:2011(e), similar to ECMAScript on default (Grammar Description) |
D | D | Boost Software License8 | |
Elixir | elixir-lang.org | Apache 2.0 | Standard library includes PCRE-based Regex module. The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and some parts of the library go beyond what PCRE offers. Currently PCRE version 8.40 (release date 2017-01-11) is used. |
Erlang | erlang.org | Apache 2.0 | Standard library includes PCRE-based re module. The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced and some parts of the library go beyond what PCRE offers. Currently PCRE version 8.40 (release date 2017-01-11) is used. |
Free Pascal (Object Pascal) | freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin and two other regular expression libraries; See wiki.lazarus.freepascal.org/Regexpr. |
Go | go.dev | BSD-style | |
Haskell | Haskell.org | BSD3 | Omitted in the language report, and in GHC's Hierarchical Libraries |
Java | Java | GNU General Public License | REs are written as strings in source code: all backslashes must be doubled, harming readability. |
JavaScript (ECMAScript) | ECMA-262 | BSD3 | Limited but REs are first-class citizens of the language with a specific /.../mod syntax. |
Julia | JuliaLang.org | MIT License | REs are part of the language core library using PCRE built-in and an optional wrapper for (C code) ICU is available. |
Lua | Lua.org | MIT License | Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg. |
Mathematica | Wolfram | Proprietary | |
.NET | MSDN | MIT License910 | |
Nim | nim-lang.org | MIT License | Standard library includes PCRE-based re and nre modules, as well as various alternatives (ex. strutils, pegs (Parsing Expression Grammar matching), strscans, parseutils, etc.). |
OCaml | Caml | LGPL | As of 2010[update], the standard module is generally regarded as deprecated;11 often recommended libraries are pcre (with full support for PCRE) and re (which is not as complete but claims better performance and provides frontends to popular syntaxes: PCRE, Perl, Posix, Emacs, shell globbing). |
Perl | Perl.com | Artistic License, or GNU General Public License | Full, central part of the language |
PHP | PHP.net | PHP License | Has two implementations, with PCRE being the more efficient in speed, functions |
POSIX C (C) | POSIX.1 web publication | Licensed by the respective implementation | Supports POSIX BRE and ERE syntax |
Python | python.org | Python Software Foundation License | Python has two major implementations, the built in re and the regex library. |
Ruby | ruby-lang.org | GNU Library General Public License | Ruby 1.8, Ruby 1.9, and Ruby 2.0 and later versions use different engines; Ruby 1.9 integrates Oniguruma, Ruby 2.0 and later integrate Onigmo, a fork from Oniguruma. |
Rust | docs.rs | MIT License | The primary regex crate does not allow look-around expressions. There is an Oniguruma binding called onig that does. |
SAP ABAP | SAP.com | Proprietary | |
Tcl | tcl.tk | Tcl/Tk License(BSD-style) | Tcl library doubles as a regular expression library. |
Wolfram Language | Wolfram Research | Proprietary: usable for free on a limited scale on the Wolfram Development platform | |
XML Schema | W3C | Licensed by the respective implementation | |
XPath 3/XQuery | W3C | Licensed by the respective implementation |
Language features
NOTE: An application using a library for regular expression support does not necessarily support the full set of features of the library, e.g., GNU grep uses PCRE, but supports no lookahead, though PCRE does.
Part 1
Language feature comparison (part 1)"+" quantifier | Negated character classes | Non-greedy quantifiers12 | Shy groups13 | Recursion | Look-ahead | Look-behind | Backreferences14 | >9 indexable captures | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes15 | Yes | Yes | Yes | Yes |
Boost.Xpressive | Yes | Yes | Yes | Yes | Yes16 | Yes | Yes | Yes | Yes |
CL-PPCRE | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
EmEditor | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
FREJ | No17 | No | Some18 | Yes | No | No | No | Yes | Yes |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
GNU grep | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | — |
Haskell | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
RXP | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes |
ICU Regex | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Java | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
JavaScript (ECMAScript) | Yes | Yes | Yes | Yes | No | Yes | Yes19 | Yes | Yes |
JGsoft | Yes | Yes | Yes | Yes | Yes20 | Yes | Yes | Yes | Yes |
Lua | Yes | Yes | Some21 | No | No | No | No | Yes | No |
.NET | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
OCaml | Yes | Yes | No | No | No | No | No | Yes | No |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
PHP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Python | Yes | Yes | Yes | Yes | Yes22 | Yes | Yes | Yes | Yes |
Qt/QRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
RE2 | Yes | Yes | Yes | Yes | No | No | No | No | Yes |
Ruby, Onigmo | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
TRE | Yes | Yes | Yes | Yes | No | No | No | Yes | No |
Vim | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
RGX | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
Tcl | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
TRegExpr | Yes | ? | Yes | ? | ? | ? | ? | ? | ? |
XML Schema | Yes | Yes | No | — | No | No | No | No | — |
XPath 3/XQuery | Yes | Yes | Yes | Yes | No | No | No | Yes | Yes |
XRegExp | Yes | Yes | Yes | Yes | No | Yes | Yes23 | Yes | Yes |
Part 2
Language feature comparison (part 2)Directives24 | Conditionals | Atomic groups25 | Named capture26 | Comments | Embedded code | Unicode property support 27 | Balancing groups28 | Variable-length look-behinds29 | |
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | Yes | Yes | Yes | Yes | Yes | No | Some30 | No | No |
Boost.Xpressive | Yes | No | Yes | Yes | Yes | No | No | No | No |
CL-PPCRE | Yes | Yes | Yes | Yes | Yes | Yes | Some31 | No | No |
EmEditor | Yes | Yes | ? | ? | Yes | No | ? | No | No |
FREJ | No | No | Yes | Yes | Yes | No | ? | No | No |
GLib/GRegex | Yes | Yes | Yes | Yes | Yes | No | Some32 | No | No |
GNU grep | Yes | Yes | ? | Yes | Yes | No | No | No | No |
Haskell | ? | ? | ? | ? | ? | No | No | No | No |
RXP | Yes | Yes | No | Yes | Yes | No | No | No | No |
ICU Regex | Yes | No | Yes | Yes33 | Yes | No | Yes | No | No |
Java | Yes | No | Yes | Yes34 | Yes | No | Some35 | No | No |
JavaScript (ECMAScript) | No | No | No | Yes | No | No | Some363738 | No | Yes |
JGsoft | Yes | Yes | Yes | Yes | Yes | No | Some39 | No | Yes |
Lua | No | No | No | No | No | No | No | No | No |
.NET | Yes | Yes | Yes | Yes | Yes | No | Some40 | Yes | Yes |
OCaml | No | No | No | No | No | No | No | No | No |
PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No |
Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No41 |
PHP | Yes | Yes | Yes | Yes | Yes | No | No | No | No |
Python | Yes | Yes | Yes42 | Yes | Yes | No | Yes43 | No | Yes44 |
Qt/QRegExp | No | No | No | No | No | No | No | No | No |
RE2 | Yes | No | ? | Yes | No | No | Some45 | No | No |
Ruby, Onigmo | Yes | Yes | Yes | Yes | Yes | No | Some46 | No | No |
Tcl | Yes | No | Yes | No | Yes | No | Yes | No | No |
TRE | Yes | No | No | No | Yes | No | ? | No | No |
Vim | Yes | No | Yes | No | No | No | No | No | Yes |
RGX | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No |
XML Schema | No | No | No | No | No | No | Yes | No | No |
XPath 3/XQuery | No | No | No | No | No | No | Yes | No | No |
XRegExp | Leading only | No | No | Yes | Yes | No | Yes | No | Yes |
API features
API feature comparisonNative UTF-16 support47 | Native UTF-8 support48 | Multi-line matching | Partial match49 | |
---|---|---|---|---|
Boost.Regex | No | No | Yes | Yes |
GLib/GRegex | Yes | Yes | Yes | Yes |
RXP | Yes | Yes | No | Yes |
ICU Regex | Yes | No | Yes | ? |
Java | Yes50 | Yes51 | Yes | Yes |
.NET | No52 | Yes | Yes | ? |
PCRE | Yes53 | Yes | Yes | Yes |
Qt/QRegExp | Yes | No | No | Yes54 |
Qt/QRegularExpression | Yes | Yes | Yes | Yes |
Tcl | Yes | Yes55 | Yes | ? |
TRE | Yes | Yes | Yes | ? |
RGX | No | No | Yes | ? |
wxWidgets::wxRegEx56 | Yes | Yes | Yes | ? |
XRegExp | Yes | Yes | Yes | No |
See also
External links
- Regular Expression Flavor Comparison – Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary
- Online Regular Expression Testing – with support for Java, JavaScript, .Net, PHP, Python and Ruby
- Implementing Regular Expressions – series of articles by Russ Cox, author of RE2
- Regular Expression Engines
References
Formerly called Regex++. ↩
One of fuzzy regular expression engines. /wiki/Regular_expression#Fuzzy_regular_expressions ↩
Included since version 2.13.0. ↩
"Getting Started – Hyperscan 5.4.0 documentation". https://intel.github.io/hyperscan/dev-reference/getting_started.html#requirements ↩
ICU4J, the Java version, does not support regular expressions. ↩
C++ bindings were developed by Google and became officially part of PCRE in 2006. ↩
One of fuzzy regular expression engines. /wiki/Regular_expression#Fuzzy_regular_expressions ↩
"STD.regex - D Programming Language - Digital Mars". http://www.digitalmars.com/d/2.0/phobos/std_regex.html ↩
"Dotnet/Corefx". GitHub. 16 February 2022. https://github.com/dotnet/corefx/blob/7116584186f8f3a886616aaf8cb5d4a982c60e27/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs#L2 ↩
"Dotnet/Corefx". GitHub. 16 February 2022. https://github.com/dotnet/corefx#license ↩
"Regex - Regular Expressions in OCaml". https://stackoverflow.com/questions/3221067#comment3323649_3221067 ↩
Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all. /wiki/POSIX ↩
Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the group's content does not need to be accessed later. ↩
Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab". ↩
"Perl Regular Expression Syntax - 1.47.0". http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions ↩
"User's Guide - 1.47.0". http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference ↩
FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier. ↩
FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier. ↩
As of ES2018 ↩
"Recursive Regex—Tutorial". https://www.rexegg.com/regex-recursion.php#engines ↩
Lua's only non-greedy quantifier is -, which is a non-greedy version of *. It does not have non-greedy versions of + or ?; in the former case, the non-greedy effect can be achieved by repeating the token followed by -, but in the latter case, there is no equivalent. ↩
Supported by the optional regex library only. https://pypi.org/project/regex/#recursive-patterns-hg-issue-27 ↩
As of ES2018 ↩
Also known as flags modifiers, modes modifiers or option letters. Example pattern: "(?i:test)". ↩
Also called independent sub-expressions. ↩
Similar to back references, but with names instead of indices. ↩
"UTS #18: Unicode Regular Expressions". https://www.unicode.org/reports/tr18/ ↩
Special feature allowing to match balanced constructs without recursion. ↩
Refers to the possibility of including quantifiers in look-behinds, thus making their length unpredictable. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Available as of ICU55. ↩
Available as of JDK7. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
The support and range of properties is dependent on implementation. ↩
"ECMA-262, 9th edition, June 2018 ECMAScript® 2018 Language Specification". www.ecma-international.org. Retrieved 4 August 2020. https://www.ecma-international.org/ecma-262/9.0/#sec-runtime-semantics-unicodematchproperty-p ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Experimental support added in v5.29.9. ↩
Supported by Python v3.11 and later, and the optional regex library only. https://pypi.python.org/pypi/regex ↩
May only be available in the regex library when used with Python versions after 3.3. ↩
Supported by the optional regex library only. https://pypi.python.org/pypi/regex ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Unicode property support may be incomplete (products are continuously updated!). All will be incomplete when a new Unicode revision is released until they are updated to comply. ↩
Means the format can be used internally without explicit conversion. ↩
Means the format can be used internally without explicit conversion. ↩
Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully.[1]. http://www.boost.org/doc/libs/1_34_1/libs/regex/doc/partial_matches.html ↩
Supports Unicode 15.0 standard from 2023.[2]. https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Character.html ↩
Supports Unicode 15.0 standard from 2023.[2]. https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Character.html ↩
Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010.[3]. /wiki/UCS-2 ↩
Since version 8.30. ↩
Partial matching is performed implicitly, requiring a separate call to matchedLength() if an exact match fails. ↩
Tcl includes facilities to convert to and from UTF-8. ↩
wxRegEx uses any system supplied POSIX library or if not available and for Unicode mode uses Henry Spencer's library. /wiki/POSIX ↩