-5

If I try to rewrite specific regex functionalities (e.g. substituting a string) in Python, a solution using the regex module is always faster.

Is regex written in C?

4
  • It sounds like you think Python's regex implementation is not written in Python. You're probably right. Next question?CommentedMay 11, 2020 at 15:00
  • 3
    A regex engine can be written in any language. In the case of Python's re module, the regex engine is indeed written in C.
    – amon
    CommentedMay 11, 2020 at 15:00
  • 1
    @amon: Actually, as I just found out, that is untrue for almost all Python implementations and only half true even for CPython. IronPython simply uses the CLI's System.Text.RegularExpressions, Jython uses a compiler written in Python which compiles to byte code for a VM written in Java, PyPy uses a compiler written in Python which compiles to byte code for a VM written in RPython, and CPython uses a compiler written in Python which compiles to byte code for a VM written in C.CommentedMay 11, 2020 at 16:00
  • I am not voting to delete this question, though it is clearly off-topic, because of @JörgWMittag's interesting answer. But I would not vote for reopen it, either.
    – Doc Brown
    CommentedMay 12, 2020 at 14:29

1 Answer 1

5

Regex is a language. It doesn't look like much of one, but it is.

Like every language, it is not written in anything. A language is a set of mathematical rules and restrictions. If we can say that it is written in anything at all, we would probably say that it is written in English. (Or in a specific English-based jargon for specifying languages, enriched with graphical and mathematical tools for expressing language rules.)

A specific implementation of the language (regex) is of course written in a specific language, but the language itself isn't.

As an example, the implementation of the re module that ships as part of the CPython implementation of the Python programming language is called the Secret Labs' Regular Expression Engine (sre), and is written in Python and C. More precisely, it consists of a compiler written in Python that compiles re regexes into byte code for a virtual machine, and a VM written in C that interprets that byte code.

The implementation that ships with Jython uses the same Python code and byte code, but the byte code VM is written in Java, not C.

At first glance, IronPython looks similar: compiler in Python and VM in C#. However, if you look closer, the VM is actually a non-functional stub, and the real implementation is in C# and is based on System.Text.RegularExpressions from the CLI.

PyPy follows the standard pattern again: compiler in Python and the VM in RPython.

And of course other languages have completely different flavors of regex. E.g. Ruby's Regexp is quite different from Python's re. And in Ruby, we have similar diversity: YARV uses an engine called Onigmo to implement its Regexp class whereas JRuby uses joni.

1
  • slightly misleading link: that specific file does not contain the actual VM logic; that's been stuffed over in the header file, which then gets called by the methods in the C file defining the Pattern object.CommentedFeb 28 at 21:27

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.