I was thinking of this some days ago after an SQL optimization. I think we can agree that SQL is a "declarative language" in the definition of Wikipedia:
Programming paradigm that expresses the logic of computation without describing its control flow
If you think how many things are done behind the curtains (looking at statistics, deciding if an index is useful, going for a nested, merged or hash join, etc..etc..) we must admit that we give just an high level logic, and the database took care of all the low level control flow logic.
Also in this scenario, sometimes the database optimizer needs some "hints" from the user to give the best results.
Another common definition of "declarative" language is (I can't find an authorative source):
Programming paradigm that expresses the desired result of computation without describing the steps to achieve it (also abbreviated with "describe what, not how")
If we accept this definition, we encounter the issues described by the OP.
The first issue is that SQL give us multiple equivalent ways to define "the same result". Probably that's a necessary evil: the more expressive power we give to a language, the more it is likely to have different ways to express the same thing.
As an example, I've been asked once to optimize this query:
SELECT Distinct CT.cust_type, ct.cust_type_description from customer c INNER JOIN Customer_type CT on c.cust_type=ct.cust_type;
Since the types were a lot less than the customer and there was an index on the cust_type
on customer table, I've achieved a great improvement by rewriting it as:
SELECT CT.cust_type, ct.cust_type_description from Customer_type CT Where exists ( select 1 from customer c Where c.cust_type=ct.cust_type);
In this specific case, when I asked the developer what he wanted to achieve he told me "I wanted all the customer types for which I had at least one customer", that incidentally is exactly how the optimizer query could be described.
So, if I could find an equivalent and more efficient query, why can't the optimizer do the same?
My best guess is that it is for two main reasons:
SQL expresses logic:
since SQL expresses high-level logic, would we really want the optimizer to "outsmart" us and our logic? I would enthusiastically shout "yes" if it was not for all the times I had to force the optimizer pick the most efficient execution path. I think that the idea could be to allow for the optimizer to do its best (also revising our logic) but give us an "hint mechanism" to come to the rescue when something go crazy (it would be like having the wheel+brakes in an autonomous car).
More choices = more time
Even the best RDBMS optimizer don't test ALL the possible execution paths, as they must be really fast: how good would be to optimize a query from 100ms to 10ms if I need to spend every time 100ms choosing the best path? And that's with the optimizer respecting our "high-level logic". If it should also test all the equivalent SQL queries the optimizer time could grow multiple times.
Another good example of query rewrite the no RDBMS is actually capable of doing is (from this interesting blog post)
SELECT t1.id, t1.value, SUM(t2.value) FROM mytable t1 JOIN mytable t2 ON t2.id <= t1.id GROUP BY t1.id, t1.value;
than can be written as this (Analytical functions required)
SELECT id, value, SUM(t1.value) OVER (ORDER BY id) FROM mytable
select whatever from sometable where FKValue in (select FKValue from sometable_2 where other_value = :param)
. It should be trivial to see how to restate that with anexists
or ajoin
.