Retrieving Data from More than One Table
The examples presented up to now in this chapter have illustrated the essential features of simple SELECT statements with data retrieval from single tables. However, much of the power of SQL lies in the ability to perform joins through a single statement, i.e. to select data from two or more tables, using the search condition to link the tables in a meaningful way.
The Join Condition
In retrieving data from more than one table, the search condition or join condition specifies the way the tables are to be linked. For example:
List the product name in addition to the EAN and price for any items with a release date in the future:SELECT product, ean_code, price FROM items, products WHERE items.release_date > CURRENT_DATE AND products.product_id = items.product_id;
The join condition here is PRODUCTS.PRODUCT_ID = ITEMS.PRODUCT_ID, which relates the product identifier in table ITEMS (where codes are listed) to the product identifier in table PRODUCTS (where names are listed).
Conceptually, the join first establishes a table containing all combinations of the rows in PRODUCTS with the rows in ITEMS, then selects those rows in which the two PRODUCT_ID values are equal. See Conceptual Description of the Selection Process for a fuller description of the conceptual SELECT process.
This does not necessarily represent the order in which the operations are actually performed; the order of evaluation of a complex SELECT statement is determined by the SQL optimizer, regardless of the order in which the component clauses are written.
Without the join condition, the result is a cross product of the columns in the tables in question, containing all possible combinations of the selected columns, for example:SELECT product, ean_code, price FROM items, products WHERE items.release_date > CURRENT_DATE;
It is easy to see that a carelessly formulated join query can produce a very large result table. Two tables of 100 rows each, for instance, give a cross product with 10,000 rows; three tables of 100 rows each give a cross product with 1,000,000 rows!
The risk of generating large (erroneous) result tables is particularly high in interactive SQL (e.g. when using Mimer BSQL), where queries are so easily written and submitted.
In simple joins, all tables used in the join are listed in the FROM clause of the SELECT statement. This is in distinction to nested joins, where the search condition for one SELECT is expressed in terms of another SELECT, see Nested Selects.
An example of a simple join is the query described in The Join Condition.
Example:SELECT product, ean_code, price FROM items, products WHERE items.product_id = products.product_id AND release_date > CURRENT_DATE;
The form SELECT * may be used in a join query, but since this selects all columns in the result set, at least one column is usually duplicated.
Example:SELECT * FROM items JOIN products ON items.product_id = products.product_id AND release_date > CURRENT_DATE;
Columns in the join query that are uniquely identified by the column name may be specified by name alone. Columns that have the same name in the joined tables must be qualified by their respective table names.
There is an alternative formulation of the query above:SELECT product, ean_code, price FROM items JOIN products ON items.product_id = products.product_id AND release_date > CURRENT_DATE;
All predicates that can be used in a where clause, except sub-selects, can be used in an on clause.
The join clause can be used as a statement on its own:items JOIN products ON items.product_id = products.product_id;
oritems NATURAL JOIN products;
About Natural Joins
A natural join, joins the table on the condition of equality between any columns with the same name, in the two tables. In the first example, all columns from the two tables are present in the result. In the second example the join columns will only occur once. Thus, in the first case, the PRODUCT_ID column appears twice in the result, while there is only one occurrence of this column in the second result.
Nesting Join Clauses
It is possible to nest join-clauses, for example:
List the category in addition to the EAN and price for any items with a release date in the future:SELECT ean_code, price, category FROM items NATURAL JOIN formats JOIN categories ON categories.category_id = formats.category_id WHERE release_date > CURRENT_DATE;
Complex Search Conditions and Joins
A join query can join any number of tables using complex search conditions to select the relevant information from each table.
List the product for any items with a release date in the future along with the item price in both Swedish and Danish crowns (SEK and DKK respectively):SELECT product, CAST(price * exchange_rate AS DECIMAL(12,2)) AS cost, currency FROM items, products, currencies WHERE release_date > CURRENT_DATE AND products.product_id = items.product_id AND (currencies.code = 'SEK' OR currencies.code = 'DKK') ORDER BY product, currency;
In formulating a search condition for a join query, it can help to write out the columns that would appear in a complete cross-product of the tables. The search condition is then formulated as though the query was a simple SELECT from the cross-product table.
The joins in the previous sections were all inner joins. In an inner join between two tables, only rows that fulfill the join condition are present in the result.
An outer join, on the contrary, contains non-matching rows as well. The outer join has two options, LEFT and RIGHT.
Left Outer Join
Example:SELECT ean_code, release_date, producer FROM items LEFT OUTER JOIN producers ON items.producer_id = producers.producer_id WHERE ean_code >= 800000000000 ORDER BY ean_code;
In the example above all rows from the table to the left in the join clause, i.e. ITEMS, are present in the result; non-matching rows from the PRODUCERS table are filled with null values in the result.
Observe the difference in result for the next statement and the previous one.SELECT ean_code, release_date, producer FROM items LEFT OUTER JOIN producers ON items.producer_id = producers.producer_id AND ean_code >= 800000000000 ORDER BY ean_code;
The reason is that conditions in the where clause are applied to the result of the join-clause and not to the joined tables as is the case with the conditions in the on-clause.
Right Outer Join
A right outer join will take all records from the table to the right in the join-clause.
Nesting Outer Joins
As with inner joins, it is possible to nest join-clauses. Nested joins can be of different types, i.e. both inner and outer joins.
The result of nested outer joins can be somewhat unexpected though, as it is the result of the first join-clause that is the left table in the next join, and not the right table in the first join-clause.
A form of SELECT, called a subselect, can be used in the search condition of a SELECT statement to form a nested query.
The main SELECT statement is then referred to as the outer select.
Select the products that have a release date in the future.SELECT product FROM products WHERE product_id IN (SELECT product_id FROM items WHERE release_date > CURRENT_DATE);
To see how this works, evaluate the subselect first:SELECT product_id FROM items WHERE release_date > CURRENT_DATE;
Then use the result of the subselect in the search condition of the outer select:SELECT product FROM products WHERE product_id IN (30206, 30618, 31082, 31083);
A subselect can be used in a search condition wherever the result of the subselect can provide the correct form of the data for the search condition.
Thus a subselect used with '=' must give a single value as a result.
A subselect used with IN, ALL or ANY must give a set of single values, see Retrieval with ALL, ANY, SOME.
A subselect used with EXISTS may give any result, see Retrieving Data Using EXISTS and NOT EXISTS.
Examples:WHERE column = (subselect) WHERE column IN (subselect) WHERE column = ALL (subselect) WHERE column = ANY (subselect) WHERE EXISTS (subselect)
Subselects and ORDER BY Clauses
Subselects cannot include ORDER BY clauses. The UNION operator can be used to combine two or more subselects in more complex statements, see Union Queries.
Many nested queries can equally well be written as simple joins. For example:
Select the products that have a release date in the future.SELECT product FROM products WHERE product_id IN (SELECT product_id FROM items WHERE release_date > CURRENT_DATE);
or alternativelySELECT DISTINCT product FROM products, items WHERE products.product_id = items.product_id AND items.release_date > CURRENT_DATE;
Both these queries give exactly the same result. In most cases, the choice of which form to use is a matter of personal preference. Choose the form which you can understand most easily; the clearest formulation is least likely to cause problems.
Subselects in Queries
Queries may contain any number of subselects, for example:
List the producers (manufacturers) which have items that are more expensive than any of the items produced by Sony.SELECT producer FROM producers WHERE producer_id IN (SELECT producer_id FROM items WHERE price > (SELECT MAX(price) FROM items WHERE producer_id = (SELECT producer_id FROM producers WHERE producer = 'SONY')));
Note the balanced parentheses for the nested levels.
It is particularly important at this level of complication to think carefully through the query to make sure that it is correctly formulated.
Often, writing some of the levels as simple joins can simplify the structure. The previous example may also be written:SELECT DISTINCT producer FROM producers, items WHERE producers.producer_id = items.producer_id AND price > (SELECT MAX(price) FROM items, producers WHERE items.producer_id = producers.producer_id AND producer = 'SONY');
Ordering Nested Queries
The ORDER BY clause may only be used in outer SELECT statements and not in subselects.
The following example is correct:SELECT country, code FROM countries WHERE currency_code IN (SELECT code FROM currencies WHERE currency IN ('Rand', 'US Dollars', 'Dong')) ORDER BY country;
The following example is incorrect:SELECT country, code FROM countries WHERE currency_code IN (SELECT code FROM currencies WHERE currency IN ('Rand', 'US Dollars', 'Dong') ORDER BY code);
A correlation name is a temporary name given to a table to represent a logical copy of the table within a query.
Correlation names can be up to a maximum of 128 characters long.
There are three uses for correlation names:
Simplifying Complex Queries Using Correlation Names
Using short correlation names into complicated queries can make the query easier to write and understand, particularly when qualified table names are used:SELECT mimer_store_music.artists.artist, mimer_store.product_details.* FROM mimer_store.product_details, mimer_store_music.titles, mimer_store_music.artists WHERE mimer_store_music.titles.item_id = mimer_store.product_details.item_id AND mimer_store_music.artists.artist_id = mimer_store_music.titles.artist_id ORDER BY mimer_store_music.artists.artist;
may be rewrittenSELECT art.artist, pdt.* FROM mimer_store.product_details AS pdt, mimer_store_music.titles AS ttl, mimer_store_music.artists AS art WHERE ttl.item_id = pdt.item_id AND art.artist_id = ttl.artist_id ORDER BY art.artist;
The keyword AS in the FROM clause may be omitted, but is recommended for clarity. Do not confuse AS in the FROM clause (defining a correlation name) with AS in the select list, see Setting Column Labels, defining a label.
About Correlation Names
Correlation names are local to the query in which they are defined.
When a correlation name is introduced for a table name, all references to the table in the same query must use the correlation name.
The following expression is not accepted:... FROM mimer_store.product_details AS pdt, mimer_store_music.titles AS ttl, ... WHERE ttl.item_id = mimer_store.product_details.item_id
Joining a Table with Itself Using a Correlation Name
Joining a table with itself allows you to compare information in a table with other information in the same table. This can be done with a correlation name.
Select all currencies with the same exchange rate:SELECT currencies.currency, currencies.code, currencies.exchange_rate FROM currencies, currencies AS copy WHERE currencies.exchange_rate = copy.exchange_rate AND currencies.currency <> copy.currency;
Here, the table CURRENCIES is joined to a logical copy of itself called COPY.
The first search condition finds pairs of currencies with the same exchange rate against the euro, and the second eliminates 'pairs' with the same currency name. Without the second condition in the search criteria, all currencies would be selected!
Without correlation names, this kind of query cannot be formulated. The following query would select all the currencies from the table:SELECT currency, code, exchange_rate FROM currencies WHERE currencies.exchange_rate = currencies.exchange_rate;
Outer References in Subselects Using Correlation Names
In some constructions using subselects, a subselect at a lower level may refer to a value in a table addressed at a higher level. This kind of reference is called an outer reference.SELECT currency FROM currencies WHERE EXISTS (SELECT * FROM countries WHERE currency_code = currencies.code);
This kind of query processes the subselect for every row in the outer select, and the outer reference represents the value in the current outer select row. In descriptive terms, the query says 'For each row in CURRENCIES, select the CURRENCY column if there are rows in COUNTRIES containing the current CODE value'.
If the qualifying name in an outer reference is not unambiguous in the context of the subselect, a correlation name must be defined in the outer select.
A correlation name may always be used for clarity, as in the following example:SELECT currency FROM currencies AS c WHERE EXISTS (SELECT * FROM countries WHERE currency_code = c.code);
Retrieving Data Using EXISTS and NOT EXISTS
EXISTS is used to check for the existence of some row or rows which satisfy a specified condition. EXISTS differs from the other operators in that it does not compare specific values; instead, it tests whether a set of values is empty or not. The set of values is specified as a subselect.
The subselect following the EXISTS clause most often uses of 'SELECT *' as opposed to 'SELECT column-list' since EXISTS only searches to see if the set of values addressed by the subselect is empty or not - a specified column is seldom relevant in the subquery.
EXISTS (subselect) is true if the result set of the subselect is not empty
NOT EXISTS (subselect) is true if the result set of the subselect is empty
SELECT statements with EXISTS almost always include an outer reference linking the subselect to the outer select.
Examples of EXISTS
Find all currencies that are used in the COUNTRIES table:SELECT currency FROM currencies AS c WHERE EXISTS (SELECT * FROM countries WHERE currency_code = c.code);
Without the outer reference, the select becomes a conditional 'all-or-nothing' statement: perform the outer select if the subselect result is not empty, otherwise select nothing.
List all products where the producer (manufacturer) is not known:SELECT product FROM products WHERE EXISTS (SELECT * FROM items WHERE producer_id IS NULL AND product_id = products.product_id);
Examples of NOT EXISTS
The next example illustrates NOT EXISTS:
List all products where the producer (manufacturer) is not known:SELECT product FROM products WHERE NOT EXISTS (SELECT * FROM items NATURAL JOIN producers WHERE product_id = products.product_id);
Negated EXISTS clauses must be handled with care. There are two semantic 'opposites' to EXISTS, with very different meanings:WHERE EXISTS (SELECT * FROM artists WHERE artist = 'Enigma')
is true if at least one artist is called ENIGMA.WHERE NOT EXISTS (SELECT * FROM artists WHERE artist = 'Enigma')
is true if no artist is call ENIGMA.
ButWHERE EXISTS (SELECT * FROM artists WHERE artist <> 'Enigma')
is true if at least one artist is not called ENIGMA.WHERE NOT EXISTS (SELECT * FROM artists WHERE artist <> 'Enigma')
is true if no artist is not called ENIGMA, that is if every artist is called ENIGMA.
The double negative in the previous example is an SQL implementation of the universal quantifier FORALL, see 'A Guide to DB2' by C. J. Date for more information on EXISTS and FORALL.
Retrieval with ALL, ANY, SOME
Subselects that return a set of values may be used in the quantified predicates ALL, ANY or SOME. ThusWHERE PRICE < ALL (subselect)
selects rows where the price is less than every value returned by the subselectWHERE PRICE < ANY (subselect)
selects rows where the price is less than at least one of the values returned by the subselect
Select countries that have an exchange rate of less than one:SELECT country FROM countries WHERE currency_code <> ALL (SELECT code FROM currencies WHERE exchange_rate >= 1.0);
If the result of the subselect is an empty set, ALL evaluates to true, while ANY or SOME evaluates to false.
An alternative to using ALL, ANY or SOME in a value comparison against a general sub-select, is to use EXISTS or NOT EXISTS to see if values are returned by a sub-select which only selects for specific values. For example:
Select countries where the associated currency code contains the letter 'E' as the middle character in the code:SELECT country FROM countries WHERE currency_code = ANY (SELECT code FROM currencies WHERE code LIKE '_E_');
is equivalent to:SELECT country FROM countries AS c WHERE EXISTS (SELECT * FROM currencies WHERE code LIKE '_E_' AND code = c.currency_code);
The UNION operator combines the results of two or more subselect clauses.
UNION first merges the result tables specified by the separate subselects and then eliminates duplicate rows from the merged set.
Select the codes for currencies and countries that start with the letter 'D':SELECT code FROM currencies WHERE code LIKE 'D%' UNION SELECT currency_code FROM countries WHERE country LIKE 'D%';
The result is obtained by merging the results of the two subselects and eliminating duplicates:SELECT code SELECT currency_code FROM currencies FROM currencies WHERE code LIKE 'D%; WHERE country LIKE 'D%';
and the UNION gives the result table:
To retain duplicates in the result table, use UNION ALL in place of UNION, see the Mimer SQL Reference Manual, Chapter 6, UNION or UNION ALL, for details.
Columns which are merged by UNION must have compatible data types (numerical with numerical, character with character).
Subselects addressing more than one result column are merged column by column in the order of selection. The number of columns addressed in each subselect must be the same.
About Column Names
The column names in the result of a UNION are taken from the names in the first subselect. Use labels in the first subselect to assign different column names to the result table:
Merge the codes and names of currencies where the code begins with 'D' with the codes and names of the countries where the country begins with 'D':SELECT code, currency AS currency_or_country FROM currencies WHERE code LIKE 'D%' UNION SELECT currency_code, country FROM countries WHERE country LIKE 'D%';
Subselects merged by UNION may not include an ORDER BY clause. However, the result of the UNION query may be ordered with an ORDER BY clause placed after the last query in the UNION.
UNION may not be used within a nested subselect. However, the results of nested queries may be joined by UNION.
More about Unions
Unions can also be used to combine information from the same table:
Find the lowest and highest exchange_rates:SELECT 'Highest', MAX(exchange_rate) AS rate FROM currencies UNION SELECT 'Lowest', MIN(exchange_rate) FROM currencies ORDER BY rate;
Outer Joins and Unions
Unions can also be used to perform outer joins, joining information in a table or tables with information not listed in those tables (i.e. information that is null). For example:
List the EAN, price and producer for each item where the EAN is 800000000000 or greater. Include a row for items where the producer is not known:SELECT ean_code, release_date, producer FROM items NATURAL JOIN producers WHERE ean_code >= 800000000000 UNION SELECT ean_code, release_date, '*** undefined ***' FROM items WHERE ean_code >= 800000000000 AND NOT EXISTS (SELECT * FROM producers WHERE producer_id = items.producer_id) ORDER BY ean_code;
Note: UNION statements including DISTINCT treat NULL values as duplicates.
Unions and NULL
In UNION queries, the keyword NULL can be included in the column list of one or both of the queries, so that columns not represented in all of the queries in the statement are retained in the result set.
Upright Database Technology AB
Voice: +46 18 780 92 00
Fax: +46 18 780 92 40