Indexing only null column values in Oracle

We have a table that can have tens or hundreds of millions of rows in it, and in this table there is a certain column. This column usually—typically—generally—has non-null hash values in it.

In unusual cases though, that column can be NULL. And the thing is, there are times when the interesting rows, the rows we want to gather together and do something with, are exactly the rows where that column is null.

Table scans to find the rows where the column is NULL are expensive, so naturally we want to index this column. But since non-null values in this column are a hash and we only ever care about the NULL values, we’d like to avoid the time and space overhead of indexing the non-null values. We’d like a partial index of only the rows where the column is NULL…

On SQL Server (2008 or newer) we’d just use a filtered index, problem solved:

CREATE NONCLUSTERED INDEX MyIndex ON MyTable (MyColumn) WHERE (MyColumn IS NULL)

On Oracle we’ll use a function-based index, but there’s an additional wrinkle: Oracle will index rows with any index key except NULL, and we need to index just rows with an index key of NULL.

Solution

Our Oracle solution has three parts:

  1. Create a user-defined function that adapts NULL values to a constant (indexable) value and adapts non-null index keys to NULL (that is, nonindexable);
  2. Create a function-based index referencing the new user-defined function; and
  3. Adjust your query to reference the function and thus get the benefit of the new index.

1. The user-defined function

CREATE OR REPLACE
FUNCTION index_null_val(value NVARCHAR2)
RETURN NUMBER
DETERMINISTIC
AS BEGIN
   IF value IS NULL THEN
      RETURN 1;
   ELSE
      RETURN NULL;
   END IF;
END;
/

2. The index referencing the function

CREATE INDEX ABCHashNull ON ABC (index_null_val(Hash));

3. The query referencing the function

Previously our query looked like this:

SELECT Id, Hash
FROM ABC
WHERE DateVal <= CURRENT_TIMESTAMP
AND HASH IS NULL;

Now we’re going to adjust line 4 to reference our function:

SELECT Id, Hash
FROM ABC
WHERE DateVal <= CURRENT_TIMESTAMP
AND index_null_val(Hash) = 1;

Before and after

Before

When we run the query as previously written, we get a full table scan…

SET AUTOTRACE ON EXPLAIN
SELECT Id, Hash
FROM ABC
WHERE DateVal <= CURRENT_TIMESTAMP
AND HASH IS NULL;

Displays the execution plan only.
        ID HASH                                                   
---------- --------------------------------------------------------
         2                                                          
         4                                                          

Plan hash value: 1033171814
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     2 |   168 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| ABC  |     2 |   168 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("HASH" IS NULL AND SYS_EXTRACT_UTC(INTERNAL_FUNCTION("DATE
              VAL"))<=SYS_EXTRACT_UTC(CURRENT_TIMESTAMP(6)))
 
Note
-----
   - dynamic sampling used for this statement (level=2)

After

When we adjust the query to make use of our new index function, we get an index range scan—much better.

SET AUTOTRACE ON EXPLAIN
SELECT Id, Hash
FROM ABC
WHERE DateVal <= CURRENT_TIMESTAMP
AND index_null_val(Hash) = 1;

        ID HASH                                                   
---------- --------------------------------------------------------
         2                                                          
         4                                                          

Plan hash value: 3267390114
 
-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     1 |    97 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| ABC         |     1 |    97 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ABCHASHNULL |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(SYS_EXTRACT_UTC(INTERNAL_FUNCTION("DATEVAL"))<=SYS_EXTRACT_UTC(CURREN
              T_TIMESTAMP(6)))
   2 - access("I3_IC"."INDEX_NULL_VAL"("HASH")=1)
 
Note
-----
   - dynamic sampling used for this statement (level=2)
SET SERVEROUTPUT ON
SET AUTOTRACE OFF

CREATE TABLE ABC ( Id int primary key, DateVal timestamp(3), Hash nvarchar2(56) NULL );
INSERT INTO ABC(Id, DateVal, Hash) VALUES(1, CURRENT_TIMESTAMP, 'ABC123');
INSERT INTO ABC(Id, DateVal, Hash) VALUES(2, CURRENT_TIMESTAMP, NULL);
INSERT INTO ABC(Id, DateVal, Hash) VALUES(3, CURRENT_TIMESTAMP, 'DEF456');
INSERT INTO ABC(Id, DateVal, Hash) VALUES(4, CURRENT_TIMESTAMP, NULL);
INSERT INTO ABC(Id, DateVal, Hash) VALUES(5, CURRENT_TIMESTAMP, 'GHI789');

CREATE OR REPLACE FUNCTION index_null_val(value NVARCHAR2)
RETURN NUMBER
DETERMINISTIC
AS BEGIN
   IF value IS NULL THEN
      RETURN 1;
   ELSE
      RETURN NULL;
   END IF;
END;
/

CREATE INDEX ABCHashNull ON ABC (index_null_val(Hash));

SET AUTOTRACE ON EXPLAIN
SELECT Id, Hash
FROM ABC
WHERE DateVal <= CURRENT_TIMESTAMP
AND HASH IS NULL;


SELECT Id, Hash
FROM ABC
WHERE DateVal <= CURRENT_TIMESTAMP
AND index_null_val(Hash) = 1;

SET AUTOTRACE OFF

--DROP INDEX ABCHashNull;
--DROP FUNCTION index_null_val;
--DROP TABLE ABC;

Acknowledgment

Thanks to my co-worker Matthew Rhea for pointing me to two posts that got me started on this solution.

Advertisements

,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s