|
|
Handling NULL Values
NULL values require special handling in SQL queries. NULL represents an unknown value, and strictly speaking NULL is never equal to NULL. (NULL values are however treated as equal for the purposes of GROUP BY, DISTINCT and UNION).
Searching for NULL
WHERE column = NULLwill not retrieve any rows since NULL is not equal to anything. The condition for selecting NULL values is
WHERE column IS NULLThe negated form (WHERE column IS NOT NULL) selects values which are not NULL (i.e. values which are known).
Find the names of the persons who made the reservations for those customers who have not yet checked in to the Hotel Skyline:
Note: 'Not checked in' is represented by NULL in the CHECKIN column.
SELECT RESERVED_FNAME, RESERVED_LNAME FROM BOOK_GUEST WHERE CHECKIN IS NULL AND HOTELCODE = (SELECT HOTELCODE FROM HOTEL WHERE NAME = 'SKYLINE');Result:
RESERVED_FNAME RESERVED_LNAME OMAR CHAFIR AGNETA ERIKSSON SVEN LINDHOLM HENRIK PIHL URBAN FRANSSONFind the names of the guests who have checked in to the Hotel Laponia.
SELECT GUEST_FNAME, GUEST_LNAME FROM BOOK_GUEST WHERE CHECKIN IS NOT NULL AND HOTELCODE = (SELECT HOTELCODE FROM HOTEL WHERE NAME = 'LAPONIA');Result:
GUEST_FNAME GUEST_LNAME CHRISTOPHER DATE STEN JOHANSEN STEFAN HANSEN GUNNAR ALVE NILS KRISTOFERSEN LARS HOLMER KNUT KULLMER JUDITH SMITH ADOLF SCHMIDT LAILA ZETTERBERG MATS HANSSONNull values in ALL, ANY, IN and EXISTS Queries
Null values should be treated cautiously, particularly in ALL, ANY, IN and EXISTS queries.
The result of a comparison involving NULL is unknown, which is generally treated as false. This can lead to unexpected results.
For example, neither of the following conditions are true:
<null> IN (...,null,...) <null> NOT IN (...,null,...)The first result is almost intuitive: since NULL is not equal to NULL, NULL is not a member of a set containing NULL.
But if NULL is not a member of a set containing NULL, the second result is intuitively true.
In fact, neither result is true or false: both are unknown. If NULL values are involved on either side of the comparison, IN and NOT IN are not complementary. Similar arguments apply to queries containing ALL or ANY:
Where are hotels with rooms that are more expensive than those at the hotel Skyline (hotel code SKY)?
SELECT NAME, CITY FROM HOTEL AS H, ROOM_PRICES AS RP WHERE H.HOTELCODE = RP.HOTELCODE AND PRICE > ALL (SELECT PRICE FROM ROOM_PRICES WHERE HOTELCODE = 'SKY');This query works as long as there are no NULL values in the PRICE column. But introduce a new room type at Skyline with an unknown price, and the query results in an empty set.
Moreover, the reverse query, hotels that are cheaper than all rooms at Skyline, also results in an empty set. A justification for this is that as long as one price at Skyline is unknown, it is impossible to say whether rooms at other hotels are more or less expensive than those at Skyline.
Using Exists
It is always possible to rephrase a query using ALL, ANY or IN in terms of one using EXISTS (with an outer reference between the selection and the EXISTS condition). This is to be recommended if the NULL indicator is to be permitted in the comparison sets, since NULL handling is then written out explicitly in the query.
Thus, the query above can also be written as follows:
SELECT NAME, CITY FROM HOTEL AS H, ROOM_PRICES AS RP WHERE H.HOTELCODE = RP.HOTELCODE AND NOT EXISTS (SELECT * FROM ROOM_PRICES WHERE HOTELCODE = 'SKY' AND ( PRICE <= RP.PRICE OR PRICE IS NULL OR RP.PRICE IS NULL ));This formulation may be read as 'Find hotels where no room at Skyline is cheaper than or the same price as any room in the hotel in question, as long as no prices are unknown'.
The explicit PRICE IS NULL clause tests that if either of the components of the comparison is NULL, then the subselect is not empty, NOT EXISTS is false, and no row is returned.
In general, a query of the form ($ stands for any comparison operator):
SELECT column-list FROM table1 WHERE column1 $ ALL (SELECT column2 FROM table2 WHERE condition) is equivalent to SELECT column-list FROM table1 WHERE NOT EXISTS (SELECT * FROM table2 WHERE condition AND ( NOT table1.column1 $ table2.column2 OR table1.column1 IS NULL OR table2.column2 IS NULL ));Where are hotels with rooms that have unknown prices or that are more expensive than rooms with known prices at hotel Skyline?
SELECT NAME, CITY FROM HOTEL H, ROOM_PRICES RP WHERE H.HOTELCODE = RP.HOTELCODE AND NOT EXISTS (SELECT * FROM ROOM_PRICES WHERE HOTELCODE = 'SKY' AND PRICE <= RP.PRICE);This query does not exclude the occurrence of the NULL indicator from the comparisons. If there is an unknown price, then the hotel concerned will be included in the result set - even if the unknown price is at Skyline itself. (Skyline might have a room that is more expensive than all rooms with known prices at Skyline).
Formulated with ALL, this query would be:
SELECT NAME, CITY FROM HOTEL H, ROOM_PRICES RP WHERE H.HOTELCODE = RP.HOTELCODE AND PRICE > ALL (SELECT PRICE FROM ROOM_PRICES WHERE HOTELCODE = 'SKY' AND PRICE IS NOT NULL);It is clear from the examples above that distinctions between queries involving NULL comparisons are subtle and are easily overlooked.
It is essential that the aim of a query is stringently defined before the query is formulated in SQL, and that the possible effects of NULL values in the search condition are considered.
There are many real-life examples where the presence of NULL has resulted in unforeseen and sometimes misleading data retrievals. It is advisable to define all columns in the database tables as NOT NULL except those where unknown values have a specific meaning, such as the CHECKIN and CHECKOUT columns in the BOOK_GUEST table. In this way the risks of confusion with NULL handling are minimized.
|
Upright Database Technology AB Voice: +46 18 780 92 00 Fax: +46 18 780 92 40 dbtechnology@upright.se |
|
|