1

AFAIK, Scala and Clojure are implemented in Java and Java is implemented in C. I suppose that many or most languages are implemented in C, for instance Perl, Python and SQL. I don't know much about language implementation but maybe you can tell me the rationale to favor one implementation language Java over another C?

Are there any theories about one language implementing another, or are we resorted to Turing machine theory where a language is regarded are a Turing machine?

9
  • 4
    There is a Python compiler implemented in Python: pypy.org
    – SK-logic
    CommentedSep 10, 2013 at 12:09
  • 14
    Languages are not implemented in anything. Compilers, interpreters and virtual machines are.CommentedSep 10, 2013 at 12:09
  • 2
    Apart from what Martijn noted, Scala, for example is even complicated: the compiler is implemented in Scala, but the target runtime environment is the Java VM. So in what way would "Scala be implemented in Java"?CommentedSep 10, 2013 at 12:20
  • @Martijn Pieters, there is one case when a language is "implemented" in another language - when the only available specification exists in form of a reference compiler/interpreter implementation, executable formal semantics, etc.
    – SK-logic
    CommentedSep 10, 2013 at 12:36
  • 1
    @JoachimSauer: at one time, the CLI backend even received support from Microsoft, but at the moment, neither Typesafe not EPFL are dedicating any resources to it, and it will no longer be shipped with Scala 2.11. But in the Scala 1.x days, both ports were kept current and on feature parity. BTW: there's also people working on ECMAScript and LLVM backends.CommentedSep 10, 2013 at 15:39

2 Answers 2

9

Languages aren't implemented in anything. A language is a set of mathematical rules and restrictions. It is specified, typically in a semi-formal subset of English, sometimes in a specialized specification language.

Compilers and interpreted are implemented, and they can be implemented in anything, provided the language is powerful enough to express the necessary concepts. If the language is Turing-complete, then there is no problem. If you are trying to write an interpreter for a Turing-complete language in a language which isn't Turing-complete, you will have to get creative, though. But even that might be possible, e.g. if you are using a Total Functional Language, you might be able to model the interpreter as co-recursion over co-data.

AFAIK, Scala and Clojure are implemented in Java

Like I said: Scala and Clojure are languages, they aren't implemented in anything.

However, both languages only have one implementation (AFAIK, anyway). The Scala compiler and libraries are written completely in Scala, the Clojure compiler is written completely in Clojure, the Clojure libraries are written almost completely in Clojure with some small bits of the core datastructures written in the host platform's native systems programming language (Java for the Java port, C# for the CLI port, ECMAScript for ClojureScript).

and Java is implemented in C.

Again, Java is a language. It has a specification, it isn't implemented in anything.

There are many implementations of Java. Almost all of them are written in Java. For example, the javac implementation of Java that ships with Oracle JDK is written in Java (and interestingly was written by Martin Odersky, the designer of Scala).

Well, actually, if you take the entire "Java Platform", then "Java" is actually three things:

  • the Java programming language
  • the Java Virtual Machine (JVM)
  • the Java Runtime Environment (JRE)

In almost all implementations of the Java Platform, both the Java language and the JRE are implemented in Java. The implementation language of the JVM differs from platform to platform: the HotSpot JVM which ships with Oracle JDK is implemented in C++, but the Maxine JVM (also by Oracle) is implemented in Java. There is an experimental JVM implemented in ECMAScript, I believe, and there is a sort-of "toy" implementation of a subset of the 1st Edition JVM in Ruby.

Also, not all implementations of Java are implemented as compilers to JVML bytecode. There are compilers which compile Java to AMD64 machine code, for example, or ECMAScript or Lisp or …

I suppose that many or most languages are implemented in C, for instance Perl, Python and SQL.

I'm not very familiar with Perl, but I'm pretty sure that not all implementations of Perl are written in C. Ponie, for example, is written in PASM and NQP and uses the PGE. No C anywhere. Implementations of Perl6 are all over the place: Pugs was written in Haskell, Rakudo is written in Perl6, NQP and PASM, there's an implementation written in CommonLisp, one in Scheme and probably many others.

For Python, there are currently four major production-ready implementations. CPython is written in C, IronPython is written in C#, Jython is written in Java and PyPy is written in RPython, which is a statically typed language somewhat similar to Java but a proper syntactic and semantic subset of Python.

The first prototype for the new JIT compiler in PyPy was written in Prolog.

SQL also has many implementations that are not written in C. The Apache Derby database, HSQLD and H2, for example, are written in Java, so their SQL compiler/interpreter is written in Java as well. SharpHSQL and VistaDB are examples of SQL databases written in C#.

I don't know much about language implementation but maybe you can tell me the rationale to favor one implementation language Java over another C?

The reasons for choosing a language to implement an interpreter or compiler are the same as for any other application, really. Familiarity. Curiosity. Hype. Superstition. Technical Superiority. Management Pressure. Peer Pressure. You name it.

There are, however, three classes of languages which make more sense than others to implement a compiler or interpreter:

  1. the source language of the compiler or interpreter, e.g. writing a Python interpreter in Python
  2. a language specifically designed for writing compilers or interpreters, e.g. ML
  3. (this is actually less relevant for an compiler and more relevant for writing an interpreter or a language runtime) a native systems programming language of the target platform.

The reason why #1 is a good idea is that it reduces the barrier to entry for help in your project. If you write your Python interpreter in Python, then every Python programmer can help you write it. In order to write an interpreter, you need to know the language being interpreted; if you write it in a different language, you need to know two languages. The downside is that the language may actually not be well-suited to writing an interpreter or compiler: imagine, for example, writing a SQL interpreter in SQL (or even more extreme: write a CSS engine in CSS, which isn't even possible).

I guess the reason why #2 is a good idea is obvious: if a language is designed for writing interpreters or compilers, then writing an interpreter or compiler will be much easier than in a language not designed for writing interpreters or compilers. The downside is that your collaborators need to learn two languages. Examples of languages designed for writing language implementations are ML and OMeta, Haskell and Scala are also good, even though they are not specifically designed for it, they share many characteristics with ML.

Choice #3 allows you to achieve tight integration with the target platform. For example, when writing a Ruby implementation which should integrate closely with the Unix platform, then it makes sense to write at the least the low-level language runtime in C, C++, D or a similar language. The disadvantages here are again that you need to know two languages and that that language may not be particularly good at writing language implementations.

Sometimes, you can hit the sweet spot, though. For example, F# was designed to be a native systems programming language for the CLI platform, and it borrows heavily from ML, so it is very good at writing compilers. So, if you write an F# compiler in F#, you have all the advantages and none of the disadvantages of #1, #2 and #3.

Note that you can take #2 to a much more extreme level. Why use just one language which is good at writing compilers? Why not use different languages which are good at writing different parts of compilers? Use lex for the lexer, yacc for the parser, Prolog for the type checker, Haskell for the semantic analyzer.

For example, the GHC Haskell compiler and libraries are written in Haskell, but the runtime system is written in C. Actually, a piece of the compiler is even written in Perl! The GCC backend generates C source code from Haskell. This C code is then handed off to GCC and compiled to assembly code. The assembly code is processed by a Perl script that makes some adjustments so that the generated code complies with the Haskell calling conventions (this cannot be expressed in C, therefore it is impossible to include that in the generated C source). Lastly, the assembly is handed off to the GNU assembler.

1
  • > write an interpreter for a Turing-complete language in a language which isn't Turing-complete - Isn't any language capable of implementing an interpreter for a Turing-complete language also necessarily Turing-complete?
    – Bob
    CommentedJul 9, 2014 at 6:03
4

It doesn't really matter what the Java or Python engine was implemented in. As long those engines are compatible with the specifications of those languages. There are some Java virtual machines that run natively on CPU hardware. So you can't assume it's always implemented in C.

There is one case where a C/C++ compiler is used to re-compile itself.

For example, let's say the developers of a C++ compiler are working on version 5. This new version fixes bugs and increases performance. They use version 4 to compile this new version 5, and then use that build to rebuild the same compiler. This creates a new compiler that has bug fixes and performance improvements.

When a high level language like Java is used to create another language. It's really just a case of higher degree of usefulness. Clojure is a step above Java in ease of use. The amount of effort to achieve the same level of usage from C++ would make the project ineffective to do, because you would have to reimplement all the underlying Java features it depends upon.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.