Languages aren't implemented in anything. A language is a set of mathematical rules and restrictions. It is specified, typically in a semi-formal subset of English, sometimes in a specialized specification language.
Compilers and interpreted are implemented, and they can be implemented in anything, provided the language is powerful enough to express the necessary concepts. If the language is Turing-complete, then there is no problem. If you are trying to write an interpreter for a Turing-complete language in a language which isn't Turing-complete, you will have to get creative, though. But even that might be possible, e.g. if you are using a Total Functional Language, you might be able to model the interpreter as co-recursion over co-data.
AFAIK, Scala and Clojure are implemented in Java
Like I said: Scala and Clojure are languages, they aren't implemented in anything.
However, both languages only have one implementation (AFAIK, anyway). The Scala compiler and libraries are written completely in Scala, the Clojure compiler is written completely in Clojure, the Clojure libraries are written almost completely in Clojure with some small bits of the core datastructures written in the host platform's native systems programming language (Java for the Java port, C# for the CLI port, ECMAScript for ClojureScript).
and Java is implemented in C.
Again, Java is a language. It has a specification, it isn't implemented in anything.
There are many implementations of Java. Almost all of them are written in Java. For example, the javac
implementation of Java that ships with Oracle JDK is written in Java (and interestingly was written by Martin Odersky, the designer of Scala).
Well, actually, if you take the entire "Java Platform", then "Java" is actually three things:
- the Java programming language
- the Java Virtual Machine (JVM)
- the Java Runtime Environment (JRE)
In almost all implementations of the Java Platform, both the Java language and the JRE are implemented in Java. The implementation language of the JVM differs from platform to platform: the HotSpot JVM which ships with Oracle JDK is implemented in C++, but the Maxine JVM (also by Oracle) is implemented in Java. There is an experimental JVM implemented in ECMAScript, I believe, and there is a sort-of "toy" implementation of a subset of the 1st Edition JVM in Ruby.
Also, not all implementations of Java are implemented as compilers to JVML bytecode. There are compilers which compile Java to AMD64 machine code, for example, or ECMAScript or Lisp or …
I suppose that many or most languages are implemented in C, for instance Perl, Python and SQL.
I'm not very familiar with Perl, but I'm pretty sure that not all implementations of Perl are written in C. Ponie, for example, is written in PASM and NQP and uses the PGE. No C anywhere. Implementations of Perl6 are all over the place: Pugs was written in Haskell, Rakudo is written in Perl6, NQP and PASM, there's an implementation written in CommonLisp, one in Scheme and probably many others.
For Python, there are currently four major production-ready implementations. CPython is written in C, IronPython is written in C#, Jython is written in Java and PyPy is written in RPython, which is a statically typed language somewhat similar to Java but a proper syntactic and semantic subset of Python.
The first prototype for the new JIT compiler in PyPy was written in Prolog.
SQL also has many implementations that are not written in C. The Apache Derby database, HSQLD and H2, for example, are written in Java, so their SQL compiler/interpreter is written in Java as well. SharpHSQL and VistaDB are examples of SQL databases written in C#.
I don't know much about language implementation but maybe you can tell me the rationale to favor one implementation language Java over another C?
The reasons for choosing a language to implement an interpreter or compiler are the same as for any other application, really. Familiarity. Curiosity. Hype. Superstition. Technical Superiority. Management Pressure. Peer Pressure. You name it.
There are, however, three classes of languages which make more sense than others to implement a compiler or interpreter:
- the source language of the compiler or interpreter, e.g. writing a Python interpreter in Python
- a language specifically designed for writing compilers or interpreters, e.g. ML
- (this is actually less relevant for an compiler and more relevant for writing an interpreter or a language runtime) a native systems programming language of the target platform.
The reason why #1 is a good idea is that it reduces the barrier to entry for help in your project. If you write your Python interpreter in Python, then every Python programmer can help you write it. In order to write an interpreter, you need to know the language being interpreted; if you write it in a different language, you need to know two languages. The downside is that the language may actually not be well-suited to writing an interpreter or compiler: imagine, for example, writing a SQL interpreter in SQL (or even more extreme: write a CSS engine in CSS, which isn't even possible).
I guess the reason why #2 is a good idea is obvious: if a language is designed for writing interpreters or compilers, then writing an interpreter or compiler will be much easier than in a language not designed for writing interpreters or compilers. The downside is that your collaborators need to learn two languages. Examples of languages designed for writing language implementations are ML and OMeta, Haskell and Scala are also good, even though they are not specifically designed for it, they share many characteristics with ML.
Choice #3 allows you to achieve tight integration with the target platform. For example, when writing a Ruby implementation which should integrate closely with the Unix platform, then it makes sense to write at the least the low-level language runtime in C, C++, D or a similar language. The disadvantages here are again that you need to know two languages and that that language may not be particularly good at writing language implementations.
Sometimes, you can hit the sweet spot, though. For example, F# was designed to be a native systems programming language for the CLI platform, and it borrows heavily from ML, so it is very good at writing compilers. So, if you write an F# compiler in F#, you have all the advantages and none of the disadvantages of #1, #2 and #3.
Note that you can take #2 to a much more extreme level. Why use just one language which is good at writing compilers? Why not use different languages which are good at writing different parts of compilers? Use lex for the lexer, yacc for the parser, Prolog for the type checker, Haskell for the semantic analyzer.
For example, the GHC Haskell compiler and libraries are written in Haskell, but the runtime system is written in C. Actually, a piece of the compiler is even written in Perl! The GCC backend generates C source code from Haskell. This C code is then handed off to GCC and compiled to assembly code. The assembly code is processed by a Perl script that makes some adjustments so that the generated code complies with the Haskell calling conventions (this cannot be expressed in C, therefore it is impossible to include that in the generated C source). Lastly, the assembly is handed off to the GNU assembler.