EXPLAIN EXTENDED

How to create fast database queries

Adjacency list vs. nested sets: Oracle

Comments enabled. I *really* need your comment

Continuing the series:

What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model?

For detailed explanations of the terms, see the first article in the series:

Today is Oracle time.

Adjacent sets require recursion, and Oracle has a native way to implement recursion in SQL statements. To do this it uses a special clause, CONNECT BY.

Unlike other systems, Oracle always supported recursion in SQL statements since version 2 (this is the name the first version of Oracle was released under).

This was done because by that time there already were several hierarchical databases on the market which are quite good for storing hierarchical data. If not for this clause, transition from a hierarchical DBMS to the relational DBMS would be too hard.

Oracle's way to query self-relations is somewhat different from recursive CTE's that other systems use:

  • Recursive CTEs can use an arbitrary set in recursive steps. The recursion base and recursion stack are not visible in further steps. The recursive operation is a set operation (usually a JOIN) on the recursion parameter (a set) and the result of the previous operation:

    Recursive CTE

  • CONNECT BY queries can only use the value of a single row in recursive steps.

    The recursion base and recursion stack are visible, the values outside the recursion stack are not.

    The recursive operation is a set operation on each row of the current row and the base rowset.

    Unlike recursive CTE, CONNECT BY operates row-wise, that is each row of the result spans its own recursion branch with its own stack:

    CONNECT BY

The main difference it that the CONNECT BY operation cannot produce anything that was not in the original rowset, while a recursive CTE can.

However, CONNECT BY has numerous benefits too: it's very fast, allows to track the hierarchy path easily (using a built-in funtion named SYS_CONNECT_BY_PATH) and implicitly sorts the recordset in tree order (with additional clause ORDER SIBLINGS BY to sort the siblings).

This method, of course, allows traversing adjacency lists easily.

To compare efficiency of adjacency lists and that of nested sets, let's create a sample table:

Table creation details

The table contains both adjacency list data and nested sets data, with 8 levels of nesting, 5 children of each parent node and 2,441,405 records.

As in the previous articles, we'll test performance of the three most used queries:

  • Find all descendants of a given node
  • Find all ancestors of a given node
  • Find all descendants of a given node up to a certain depth

All descendants

Nested sets

 SELECT SUM(LENGTH(hc.stuffing)) FROM t_hierarchy hp JOIN t_hierarchy hc ON hc.lft BETWEEN hp.lft AND hp.rgt WHERE hp.id = 42 

View query details

This is super fast, faster than in any system before that: only 10 ms. The plan is quite predictable: a UNIQUE scan to find the parent record and a range scan to find all children.

Adjacency list

 SELECT SUM(LENGTH(stuffing)) FROM t_hierarchy START WITH id = 42 CONNECT BY parent = PRIOR id 

View query details

This is quite fast too: 140 ms. The plan loops over the parent ranges for each id.

Note the TABLE ACCESS FULL in the end of the plan. Oracle reserves a right to do a FULL SCAN instead of a RANGE SCAN if it considers a certain parent range too large to use the index. However, this is recursion, not a single operation, and it's impossible to tell in advance which id values will be returned on each recursion step. That's why Oracle shows both methods, just in case, and chooses the most efficient method in runtime.

In our table, at most 5 rows share a parent (of more then 2 million), so it's very unlikely this plan branch will be ever chosen.

Find all ancestors

Nested sets

 SELECT hp.id, hp.parent, hp.lft, hp.rgt, hp.data FROM t_hierarchy hc JOIN t_hierarchy hp ON hc.lft BETWEEN hp.lft AND hp.rgt WHERE hc.id = 1000000 ORDER BY hp.lft 

View query details

The plan used is exactly the same as for the query to choose all descendants.

However, the range on lft is too broad and the query works for more than 3 seconds.

Adjacency list

 SELECT id, parent, lft, rgt, data, level FROM t_hierarchy START WITH id = 1000000 CONNECT BY id = PRIOR parent ORDER BY level DESC 

View query details

This query completes in less than 1 ms, which is instant.

Note that unlike recursive CTE's, Oracle supplies a built-in preudocolumn, level, that returns the current recursion level. This comes handy in cases like this.

Descendants up to a given level

Nested sets

 SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data FROM t_hierarchy hp JOIN t_hierarchy hc ON hc.lft BETWEEN hp.lft AND hp.rgt WHERE hp.id = ? AND ( SELECT COUNT(*) FROM t_hierarchy hn WHERE hc.lft BETWEEN hn.lft AND hn.rgt AND hn.lft BETWEEN hp.lft AND hp.rgt ) <= 3 

View the query for node 42

View the query for node 31,415

As in other systems, performance heavily depends on how many descendants does the record have.

For the node 42 which is close to the root node and has lots of descendants, the query runs for 122 seconds. For the node 31,415, the query takes 1.5 ms (next to instant).

Adjacency list

 SELECT id, parent, lft, rgt, data FROM t_hierarchy START WITH id = ? CONNECT BY parent = PRIOR id AND level <= 3 

Limiting the recursion depth query is very simple in Oracle: we just add the additional condition to CONNECT BY clause constraining the limit.

View the query for node 42

View the query for node 31,415

Both queries complete in 1 ms, that is instant.

Summary

Oracle implements a special clause, CONNECT BY, that makes traversing adjacency lists in general and parent-child relationships in particular very easy, since this is exactly what this clause was intended for.

Except for finding all descendants of a single node (without finding out or limiting the depth), nested sets model is less efficient in Oracle than adjacency lists.

Adjacency lists are not only more efficient but the queries using them are more elegant and easy to maintain due to the native support.

That's why adjacency list model should be used instead of nested sets model in Oracle too, as well as in SQL Server and PostgreSQL 8.4.

Written by Quassnoi

September 28th, 2009 at 11:00 pm

Posted in Oracle

Leave a Reply

close