32

I ask because so many of the questions I see in SQL amount to: "This is slow. How do I speed it up"? Or are tutorials stating "Do this this way and not that way as it's faster".

It seems to me that a large part of SQL is knowing just how an expression would be performed and from that knowledge chosing expression styles that perform better. This doesn't square with one aspect of declaritive programming - that of leaving the system to decide how best to perform the calculation with you just specifying what the calculation should produce.

Shouldn't an SQL engine not care about if you used in, exists or join if it is truly declarative shouldn't it just give you the correct answer in reasonable time if possible by any of the three methods? This last example is prompted by this recent post which is of the type mentioned in my opening paragraph.

Indexes

I guess the easiest example I could have used relates to creating an index for a table. The gumph here on w3schools.com even tries to explain it as something unseen by the user that is there for performance reasons. Their description seems to put SQL indices in the non-declarative camp and they are routinely added by hand for purely performance reasons.

Is it the case that their is somewhere an ideal SQL DB that is much more declarative than all the rest but because it is that good one doesn't hear about it?

6
  • @FrustratedWithFormsDesigner: I know exactly what that means. select whatever from sometable where FKValue in (select FKValue from sometable_2 where other_value = :param). It should be trivial to see how to restate that with an exists or a join.CommentedJun 3, 2013 at 20:47
  • Using similar reasoning I guess regular expressions are a more declarative method of expression as I rarely see performance questions answered by "you should write it this way to get better performance". I am wracking my brains and can half remember some question to do with negative look-behind or ahead assertions in a slow regexp where the answer was to rewrite the regexp in a different way to do the same in less time.
    – Paddy3118
    CommentedJun 3, 2013 at 21:16
  • Performance is an implementation detail. The performance of nearly any IN implementation could be comparable or better than EXISTS and JOIN if the query processor developers felt it was a priority.
    – JustinC
    CommentedJun 4, 2013 at 7:06
  • 1
    @JustinC, it seems to be more than a detail given the preponderance of performance oriented SQL questions and tips for a supposedly declarative language?
    – Paddy3118
    CommentedJun 4, 2013 at 12:34
  • There is no clear definition of a declarative programming language, and so it's meaningless to talk about. Some languages are higher-level than others, that's all.CommentedNov 6, 2016 at 17:15

3 Answers 3

31

SQL is theoretically declarative. But you know what they say about the difference between theory and practice...

At its core, the concept of "declarative programming" has never been truly effective, and likely never will until we have an AI-based compiler that's capable of looking at code and answering the question "what is the intention of this code?" intelligently, in the same way that the person who wrote it would. At the heart of every declarative language is a whole bunch of imperative code trying frantically to solve that problem without the help of an AI.

Often it works surprisingly well, because the most common cases are common cases, which the people who wrote the language's implementation knew about and found good ways to handle. But then you run up against an edge case that the implementor didn't consider, and you see performance degrade quickly as the interpreter is forced to take the code much more literally and handle it in a less efficient manner.

8
  • 4
    Never truly effective? SQL, LINQ, Knockout.js, Prolog, ELM language. You may want to check again. I am using mostly declarative technologies at the moment.
    – brian
    CommentedJun 3, 2013 at 21:01
  • 6
    @brian: And all of them degenerate rather quickly when you happen upon an edge case that no one thought of. I suppose I should have said "never truly effective in the general case."CommentedJun 3, 2013 at 21:06
  • 14
    @brian: I rarely hit an edge case in any of them that couldn't be solved within the framework. Yeah, that's the whole point: having to figure out a way to solve them within the framework because the framework isn't smart enough to solve it for you the way you originally declared it.CommentedJun 3, 2013 at 21:45
  • 3
    What about select ... for update? It seems an imperative command.CommentedMar 9, 2015 at 9:56
  • 1
    Hi @JacquesB, I thought the answer goes beyond what you state. My question is on the observation that SQL problems are to do with efficiency. Declaritive programming is to do with stating what rather than how a problem is solved. SQL is said to be declarative, but has many people aiding the how, i.e. at odds with this aspect of declarative. Correct is assumed, its a question and answer on efficiency .
    – Paddy3118
    CommentedSep 1, 2021 at 11:44
10

I was thinking of this some days ago after an SQL optimization. I think we can agree that SQL is a "declarative language" in the definition of Wikipedia:

Programming paradigm that expresses the logic of computation without describing its control flow

If you think how many things are done behind the curtains (looking at statistics, deciding if an index is useful, going for a nested, merged or hash join, etc..etc..) we must admit that we give just an high level logic, and the database took care of all the low level control flow logic.

Also in this scenario, sometimes the database optimizer needs some "hints" from the user to give the best results.

Another common definition of "declarative" language is (I can't find an authorative source):

Programming paradigm that expresses the desired result of computation without describing the steps to achieve it (also abbreviated with "describe what, not how")

If we accept this definition, we encounter the issues described by the OP.

The first issue is that SQL give us multiple equivalent ways to define "the same result". Probably that's a necessary evil: the more expressive power we give to a language, the more it is likely to have different ways to express the same thing.

As an example, I've been asked once to optimize this query:

 SELECT Distinct CT.cust_type, ct.cust_type_description from customer c INNER JOIN Customer_type CT on c.cust_type=ct.cust_type; 

Since the types were a lot less than the customer and there was an index on the cust_type on customer table, I've achieved a great improvement by rewriting it as:

 SELECT CT.cust_type, ct.cust_type_description from Customer_type CT Where exists ( select 1 from customer c Where c.cust_type=ct.cust_type); 

In this specific case, when I asked the developer what he wanted to achieve he told me "I wanted all the customer types for which I had at least one customer", that incidentally is exactly how the optimizer query could be described.

So, if I could find an equivalent and more efficient query, why can't the optimizer do the same?

My best guess is that it is for two main reasons:

SQL expresses logic:

since SQL expresses high-level logic, would we really want the optimizer to "outsmart" us and our logic? I would enthusiastically shout "yes" if it was not for all the times I had to force the optimizer pick the most efficient execution path. I think that the idea could be to allow for the optimizer to do its best (also revising our logic) but give us an "hint mechanism" to come to the rescue when something go crazy (it would be like having the wheel+brakes in an autonomous car).

More choices = more time

Even the best RDBMS optimizer don't test ALL the possible execution paths, as they must be really fast: how good would be to optimize a query from 100ms to 10ms if I need to spend every time 100ms choosing the best path? And that's with the optimizer respecting our "high-level logic". If it should also test all the equivalent SQL queries the optimizer time could grow multiple times.

Another good example of query rewrite the no RDBMS is actually capable of doing is (from this interesting blog post)

SELECT t1.id, t1.value, SUM(t2.value) FROM mytable t1 JOIN mytable t2 ON t2.id <= t1.id GROUP BY t1.id, t1.value; 

than can be written as this (Analytical functions required)

 SELECT id, value, SUM(t1.value) OVER (ORDER BY id) FROM mytable 
1
  • 2
    The example of rewriting the join to an exists is interesting. One rule of thumb I try to impress on SQL developers is that use of DISTINCT is a code smell -- either the query, or the data model, is very possibly wrong, and a different approach should be sought.CommentedNov 7, 2016 at 17:54
4

First lets clarify what "declarative" means. Declarative is a language paradigm, where the code is a specification of the desired end result, rather than (as in imperative programming) an explicit series of steps to execute.

As an example, here is how to draw a line using JavaScript, an imperative language:

var ctx = canvas.getContext('2d'); ctx.moveTo(25, 25); ctx.lineTo(125, 150); 

Here is how to draw a line with SVG, a declarative language:

<svg> <line x1="25" y1="25" x2="125" y2="150" /> </svg> 

The major difference is that the imperative code specifies an explicit sequence of operations. The SVG engine have somewhat more freedom in how to render the line (e.g it could start from either end!), but it does not need to "guess the intention" of the code any more than an imperative language. And there is certainly no need for an artificial intelligence!

You question implies that database engines need to to have certain level of optimizations before SQL qualifies as declarative. But "declarative" describes language semantics, not implementation, so whether database engines are dumb and naive or highly optimized does affect the classification of the language as declarative or not.

So is SQL declarative? DML statements like INSERT and UPDATE are imperative since order matters, DDL statements like CREATE TABLE or DROP TABLE are imperative. SQL scripts and stored procedures are imperative programs and may even may have mutable variables and loops (depending on the dialect).

When people say SQL is declarative they typically think only of queries, i.e. SELECT statements. Queries are declarative because they may specify multiple joins, conditions and transformations, but does not specify any particular order of operations. This is the job of the query planner, which transform the query into a sequence of concrete operations.

You seem to have some frustrations about lack of sophistication of the query planner is. But this is highly dependent on database engine. There is an universe of difference between Access or MySql or SQL server. So questions about optimizations should be always specify the database system in question.

Your concern about indexes is a different issue.

The relational model has a fundamental distinction between the logical and the physical level. Schemas and queries exist on the logical level. Caching, storage, partitioning, sharding, indexes etc. exist on the physical level. The rule is that configuration on the physical level affect performance but not logic. In other words, creating an index may change how fast a query returns a result, but will not change what result it returns.

Conversely, you should not need to know anything about the physical level when writing queries. The query planner should figure out what indexes are available and use them when it may improve performance.

But you suggest that the database should also create indexes as appropriate. This is not a desirable. Indexes also has a cost - otherwise you would just create indexes on everything. So for a professional databases you want a DBA to deliberately manage indexes. That said, there exist tools which can analyze database statistics and suggest indexes and other configuration. But you probably don't want such tools to modify the configuration unsupervised.

SQL does have "escape hatches" which allow you to provide physical-level hints (like what indexes to use) in order to optimize. But this should definitely only be necessary in edge case.

Manually managed indexes does not affect the declarativeness of SQL, but the use of index hints in queries does undermine the declarativeness and the logical/physical separation. If you discover that physical-level hints will improve performance in the general case, then you may have a bad or badly configured database engine.

2
  • So are you stating that there is a subset of SQL query features that is NOT declaritive, but that its use is negligible?
    – Paddy3118
    CommentedJul 27, 2021 at 6:02
  • 1
    @Paddy3118: It is a somewhat grey area, since SQL query hints is not part of the standard and differs from database engine to database engine. But for example SQL Server supports a FORCE ORDER query hint which arguably turns the query into imperative code. Such hints are generally advised against, so their use should ideally be negligible.
    – JacquesB
    CommentedJul 27, 2021 at 7:30

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.