If I try to rewrite specific regex functionalities (e.g. substituting a string) in Python, a solution using the regex module is always faster.
Is regex written in C?
Regex is a language. It doesn't look like much of one, but it is.
Like every language, it is not written in anything. A language is a set of mathematical rules and restrictions. If we can say that it is written in anything at all, we would probably say that it is written in English. (Or in a specific English-based jargon for specifying languages, enriched with graphical and mathematical tools for expressing language rules.)
A specific implementation of the language (regex) is of course written in a specific language, but the language itself isn't.
As an example, the implementation of the re
module that ships as part of the CPython implementation of the Python programming language is called the Secret Labs' Regular Expression Engine (sre), and is written in Python and C. More precisely, it consists of a compiler written in Python that compiles re
regexes into byte code for a virtual machine, and a VM written in C that interprets that byte code.
The implementation that ships with Jython uses the same Python code and byte code, but the byte code VM is written in Java, not C.
At first glance, IronPython looks similar: compiler in Python and VM in C#. However, if you look closer, the VM is actually a non-functional stub, and the real implementation is in C# and is based on System.Text.RegularExpressions
from the CLI.
PyPy follows the standard pattern again: compiler in Python and the VM in RPython.
And of course other languages have completely different flavors of regex. E.g. Ruby's Regexp
is quite different from Python's re
. And in Ruby, we have similar diversity: YARV uses an engine called Onigmo to implement its Regexp
class whereas JRuby uses joni.
re
module, the regex engine is indeed written in C.System.Text.RegularExpressions
, Jython uses a compiler written in Python which compiles to byte code for a VM written in Java, PyPy uses a compiler written in Python which compiles to byte code for a VM written in RPython, and CPython uses a compiler written in Python which compiles to byte code for a VM written in C.