The Mimer SQL server contains a highly advanced SQL optimizer. The optimizer performs numerous transformations and computes the most efficient access path to get the query results.
It is possible to view the results of the optimization process to help in the construction of efficient queries. In the bsql command line utility (aka. Batch SQL), the optimizer output may be viewed in its raw format. The output is xml-based.
It is also possible to view a graphical output of the same data in the DbVisualizer Pro front-end. Note that this functionality is not available in the version bundled with the Mimer SQL distribution. Download a DbVisualizer Pro trial license and try it out! (https://www.dbvis.com/download/).
In this article some reference material about explain output is given, followed by explain discussions for some sample queries. The examples are based on the Mimer example database created by the exload program. When created, login as MIMER_STORE with password GoodiesRUs if you want to try it.
For an explanation of the explain functionality in older Mimer SQL versions, using the previous SQL compiler, please see the article Mimer SQL Engine Explain.
Explain tree nodes
Node name | Explanation |
---|---|
select | The SQL statement is a select statement. |
insert | The SQL statement is an insert statement. |
update | The SQL statement is an update statement. |
delete | The SQL statement is a delete statement. |
tempTable | The optimizer has decided to use a temporary table for the results thus far in the query. |
union | The optimizer is performing a union all of two or more result sets (i.e. concatenating the results without temporary table). An SQL UNION operation is typically translated to a tempTable containing a union. |
subselect | A subselect node is a subquery such as an exists, a scalar subquery, or a quantified predicate (IN-clause etc.). A subselect will be executed for each row found in the outer table(s). |
constSubselect | A constSubselect node is a subquery that has no dependencies to outer tables and returns zero or one row. Examples of this are non-correlated exists and scalar subqueries. These are only executed once and the result is reused for each row found in the outer table(s). |
innerJoin | The innerJoin node is used when two tables are joined. Only rows that match the join condition are returned. |
outerJoin | The outerJoin node is used for an outer join query. When there is no match in the outer join, null values are returned. |
crossJoin | This is a join where there are no join conditions between two tables. Every row in the first table will match all rows returned by the second table. The query may be lacking appropriate conditions when this occurs. Double check that your query returns the desired results. |
table | This node is used to access the contents of a single table and/or index. |
Explain tree attributes
Attribute | Used by | Explanation |
---|---|---|
name | table | Table name, including alias, used in SQL statement. |
order | table | Scan order of the table/index in the query. |
index | table | Here we see the name of the access path picked by the optimizer. The name depends on the type, either name of the index or the constraint. When a base table is read the primary key or internal key is always used. |
scan | table |
|
type | table |
|
indexLookupOnly | table | When accessing an index, first the index is read and then the base table. However, if an SQL statement only uses column values that are present in the index then indexLookupOnly can be used. In this case only the index is read and no corresponding base table lookup is made. Please note that when a primary key or internal key is used, only the base table is read even though this option is not set. |
subselect | All nodes | A subselect node is a subquery such as an EXISTS, a scalar subquery, or a quantified predicate (IN-clause etc.). A subselect will be executed for each row found in the outer table(s). |
cost | All nodes | The estimated cost for the accessing the table. |
hits | All nodes | The number of hits that will be returned. |
visits | select | The number or rows that the system will read from a table or temporary table. When an index is accessed this includes both index table lookup and base table lookup (unless indexLookupOnly is used in which case the base table is never accessed). For parent nodes (such as a join node) this is the accumulated value for underlying accesses. |
tempWrites | tempTable | Number of rows written to temporary table. |
rows | table | This is the number of rows in the table. This is the value from the last time update statistics was run on the table. If there are no statistics collected for table the optimizer will look in the table to estimate the number of rows. |
Join
Let us start with a simple query with a join:
select cou.country, cur.currency from currencies cur, countries cou where cou.country in ('Belgium', 'Norway') and cou.currency_code = cur.code;
The query returns which currency is used in Belgium and Norway. The explain plan in DbVisualizer looks as follows:
The diagram is read starting in the lower left corner and then working your way up to the right. If in doubt, check the scan order field to see the execution order. The same query in bsql is shown here. Note the command set explain on is given first to see the explain plan:
SQL> set explain on; SQL> select cou.country, cur.currency from currencies cur, countries cou SQL& where cou.country in ('Belgium', 'Norway')* SQL& and cou.currency_code = cur.code; Start of explain result <select cost="6" hits="2" visits="6"> <innerJoin cost="6" hits="2" visits="6"> <table name="countries cou" order="1" index="cnt_country_exists" scan="leadingKeys" type="unique key" cost="4" hits="2" visits="4" rows="234"/> <table name="currencies cur" order="2" index="SQL_PRIMARY_KEY_5715" scan="unique" type="primary key" cost="1" hits="1" visits="1" rows="162"/> </innerJoin> </select> End of explain result country currency ================================================ Belgium Euros === Norway Norwegian Kroner === 2 rows found
Let us examine the explain output in some detail. Note that the XML contains exactly the same information as is used by DbVisualizer to display the explain graph.
The join order picked by the optimizer is to start with the countries table. This table is then joined with the currencies table.
<table name="countries cou" order="1" index="cnt_country_exists" scan="leadingKeys" type="unique key" cost="4" hits="2" visits="4" rows="234"/>
Order 1 shows that the countries table is read first. The unique key cnt_country_exists index is used to scan the table. We have a condition on the first column in the index (cou.country = ‘Belgium’) which is why the scan is leadingKeys. In DbVisualizer you can see all the columns of cnt_country_exists under Access Paths:
The index cnt_country_exists has both the country column and the primary key code column. The visit count is 4 because two rows are read in the index, and two rows from the base table. This will result in a hit-count of 2 rows. The statistics indicates there are 234 rows in the countries table.
Let us now examine the join-node:
<innerJoin cost="6" hits="2" visits="6"> <table name="countries cou" order="1" index="cnt_country_exists" scan="leadingKeys" type="unique key" cost="4" hits="2" visits="4" rows="234"/> <table name="currencies cur" order="2" index="SQL_PRIMARY_KEY_5715" scan="unique" type="primary key" cost="1" hits="1" visits="1" rows="162"/> </innerJoin>
The join node contains the cost of processing the two tables. The number of visits and hits are as follows:
Visits (6) = visits in countries (4) + hits in countries (2) * visits in currencies (1) Hits (2) = hits in countries (2) * hits in currencies (1)
When there are no temporary tables involved the cost is equal to the total number of visits.
Let us see what would happen if we were to force the opposite join order. This is done by using the {order} clause in the from-list:
SQL> set explain on; SQL> select cou.country, cur.currency SQL> from {order} currencies cur, countries cou SQL& where cou.country in ('Belgium', 'Norway') SQL& and cou.currency_code = cur.code; Start of explain result <select cost="486" hits="162" visits="486"> <innerJoin cost="486" hits="162" visits="486"> <table name="currencies cur" order="1" index="SQL_PRIMARY_KEY_5715" scan="sequential" type="primary key" cost="162" hits="162" visits="162" rows="162"/> <table name="countries cou" order="2" index="SQL_FOREIGN_KEY_5723" scan="leadingKeys" type="foreign key index" cost="2" hits="1" visits="2" rows="234"/> </innerJoin> </select> End of explain result
This was clearly a bad idea. We now got a sequential scan of the currencies table. Since we only had a join condition (cou.currency_code = cur.code) on this table which cannot be evaluated until we read both tables we get 162 hits. The countries table now uses the foreign key index to find the values. The inner join cost was:
Visits (486) = visits in currencies (162) + hits in currencies (162) * visits in countries (2) Hits (162) = hits in currencies (162) * hits in countries (1)
Inner join, outer join and cross join are all computed in this way.
Temporary Tables
There are several different types of temporary tables depending on which operation is being handled. For example distinct, group by and order by. Tables used for order by and group by are both inserted to and read. This is seen as both a write count and a visit count. Distinct temporary tables are only used to avoid duplicates and never have to be read. In this case we have a write count, but no visit count. It is a bit more expensive to perform write operations than read operations. The optimizer currently uses a factor of ~1.4 to estimate the cost.
Let us look at an example:
SQL> set explain on; SQL> select cou.country from currencies cur, countries cou SQL& where cou.currency_code = cur.code SQL& and cur.currency in ('Swiss Francs', 'Pulas', 'Czech Korony') SQL& order by cou.country; Start of explain result <select cost="175" hits="3" visits="171" tempWrites="3"> <tempTable cost="172" class="TempTableOrderBy" hits="3" visits="3" tempWrites="3"> <innerJoin cost="168" hits="3" visits="168"> <table name="currencies cur" order="1" index="SQL_PRIMARY_KEY_5715" scan="sequential" type="primary key" cost="162" hits="3" visits="162" rows="162"/> <table name="countries cou" order="2" index="SQL_FOREIGN_KEY_5723" scan="leadingKeys" type="foreign key index" cost="2" hits="1" visits="2" rows="234"/> </innerJoin> </tempTable> </select> End of explain result
As can be seen the optimizer estimates that 3 rows will be both written and read from the temporary table. The cost (172) = innerJoin cost (168) + tempWrites (3) * 1.4. In the next level we add the cost for reading (visits=”3″) so the total cost is 175.
Subqueries
There are three variations of optimization of subqueries that are important to understand:
- The simplest one is where the subquery is executed once for each row of the outer query.
- Sometimes it is possible to execute the subquery as a join with the outer table(s). In this case the subquery can occur somewhere in the join order. Depending on the actual conditions a temporary table may be needed to eliminate duplicates in this case.
- If the subquery is not correlated, i.e. has no conditions that relate to the outer query and only return zero or one rows, then the query can be executed once and for all and the result is then reused as the outer tables are processed.
We will look at examples of the above three cases.
select cou.country from countries cou where country in ('Andorra','Angola','Anguilla','Antigua and Barbuda') and cou.currency_code in (select cur.code from currencies cur where exchange_rate > 0.3);
For each hit in the countries table we run the subselect. In the query the optimizer has used the primary key to look up the corresponding row in the currencies table. The subselect is evaluated 4 times and each has a cost of 1.
We will now take a look at a query where the subquery participates in the outer query join.
SQL> set explain on; SQL> select cur.currency SQL& from currencies cur SQL& where exists (select 1 from countries c SQL& where c.country in ('Sweden' ,'Japan') SQL& and c.currency_code = cur.code); Start of explain result <select cost="9" hits="2" visits="6" tempWrites="2"> <tempTable cost="9" class="TempTableJoinBySubselect" tempWrites="2"> <innerJoin cost="6" hits="2" visits="6"> <table name="countries c" order="1" index="cnt_country_exists" scan="leadingKeys" type="unique key" cost="4" hits="2" visits="4" rows="234"/> <table name="currencies cur" order="2" index="SQL_PRIMARY_KEY_5715" scan="unique" type="primary key" cost="1" hits="1" visits="1" rows="162"/> </innerJoin> </tempTable> </select> End of explain result
In the explain output the exists now participates as part of the join with the currencies table. An extra temporary table is needed in this query to eliminate duplicates.
And finally the third type is the constant subquery.
SQL> set explain on; SQL> select cur.currency SQL& from currencies cur SQL& where cur.exchange_rate < (select exchange_rate SQL& from currencies xcur, countries cou SQL& where cou.country = 'Belgium' SQL& and cou.currency_code = xcur.code); Start of explain result <select cost="165" hits="54" visits="165"> <constSubselect cost="3" hits="1" visits="3"> <innerJoin cost="3" hits="1" visits="3"> <table name="countries cou" order="1" index="cnt_country_exists" scan="leadingKeys" type="unique key" cost="2" hits="1" visits="2" rows="234"/> <table name="currencies xcur" order="2" index="SQL_PRIMARY_KEY_5715" scan="unique" type="primary key" cost="1" hits="1" visits="1" rows="162"/> </innerJoin> </constSubselect> <table name="currencies cur" order="3" index="SQL_PRIMARY_KEY_5715" scan="sequential" type="primary key" cost="162" hits="54" visits="162" rows="162"/> </select> End of explain result
The select using currencies and countries is a scalar subselect that only returns a single value. There are no references to the outer table currencies cur. Therefore the constSubelect node can be used. It is executed only once, and then the result is used for each row in the currencies cur table. In the query above the cost is much cheaper than evaluating the subquery for each row in currencies cur (instead of a cost of 165 we would get 660 = 165 + 165 * 3).
Union
Let us look at an example of how union costs are computed:
SQL> set explain on; SQL> select c.category from categories c SQL& union SQL& select f.format from formats f; Start of explain result <select cost="30" hits="13" visits="16" tempWrites="10"> <tempTable cost="30" class="TempTableUnion" visits="3" tempWrites="10"> <union cost="13" hits="13" visits="13"> <table name="formats f" order="1" index="fmt_primary_key" scan="sequential" type="primary key" cost="10" hits="10" visits="10" rows="10"/> <table name="categories c" order="2" index="ctg_primary_key" scan="sequential" type="primary key" cost="3" hits="3" visits="3" rows="3"/> </union> </tempTable> </select> End of explain result
In a union duplicates are eliminated. This can be seen as a tempTable node. The cost of the union is the cost of each branch and then the temporary table handling. Remember that writes cost an extra ~1.4 in the cost. It can be noted that the optimizer has reordered the union branches because the select from the categories tables only returns distinct values because there is a unique index on the category column. When the last union branch is distinct (i.e. that branch alone does not return any duplicate values), the system will only check if the value has been seen in earlier union branches. This can be seen as tempWrites=”10″ (from the first union branch) and visits=”3″ to check that the value has not occurred before. The total cost (30) = union cost (13) + tempWrites 10 * 1.4 + temp table visits (3).
The optimizer does not know anything about the actual content of the two columns. However, the SQL programmer may know that the values in the two tables are distinct. If that is the case the query can be rephrased as:
SQL> set explain on; SQL> select c.category from categories c SQL& union all SQL& select f.format from formats f; Start of explain result <select cost="13" hits="13" visits="13"> <union cost="13" hits="13" visits="13"> <table name="categories c" order="1" index="ctg_primary_key" scan="sequential" type="primary key" cost="3" hits="3" visits="3" rows="3"/> <table name="formats f" order="2" index="fmt_primary_key" scan="sequential" type="primary key" cost="10" hits="10" visits="10" rows="10"/> </union> </select> End of explain result
The change is that UNION ALL is used instead of UNION. This will, of course, result in a more efficient query as no temporary table handling is needed. The cost has subsequently dropped from 30 to 13!