Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Thursday, April 13, 2006

Query Tuned for complicated OR clause

Investigating a certain Statspack today I noticed that 58% of the DB CPU time is spent in a SQL as follows -


                                                                                          CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s)
--------------- ------------ -------------- ------ -------- ---------

478,456,910     3,804         125,777.3               58.5     2334.82  2302.48




SELECT *
  FROM knta_exceptions ex
 WHERE (source_id IN (
              SELECT kp2.project_id
                FROM kdrv_projects kp2, kdrv_projects kp1
               WHERE kp1.project_id = :p_source_id
                 AND kp2.project_path_id_list LIKE kp1.project_path_id_list || '%'
                 AND kp2.project_type_code = 'TASK')
       OR
       (source_id IN (
               SELECT kp2.project_id
                 FROM kdrv_projects kp2, kdrv_projects kp1
                WHERE kp1.project_id = :p_source_id
                AND kp2.project_path_id_list LIKE kp1.project_path_id_list || '%'
                AND kp2.project_type_code = 'PROJECT')
        AND source_entity_id = 59))
  AND source_entity_id IN (59, 6);

This SQL was doing 125,777 Logical I/O (buffer gets) per execution.

Explain plan in 9i looks like -



-----------------------------------------------------------------------------------
| Id  | Operation                     |  Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |  4049 |   478K|   168 |
|   1 |  FILTER                       |                   |       |       |       |
|   2 |   TABLE ACCESS FULL           | KNTA_EXCEPTIONS   |  4049 |   478K|   168 |
|   3 |   NESTED LOOPS                |                   |     1 |    78 |     3 |
|   4 |    TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS     |     1 |    36 |     2 |
|   5 |     INDEX UNIQUE SCAN         | KDRV_PROJECTS_U1  |     1 |       |     1 |
|   6 |    TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS     |     1 |    42 |     1 |
|   7 |     INDEX UNIQUE SCAN         | KDRV_PROJECTS_U1  |     1 |       |       |
|   8 |   NESTED LOOPS                |                   |     1 |    78 |     3 |
|   9 |    TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS     |     1 |    36 |     2 |
|  10 |     INDEX UNIQUE SCAN         | KDRV_PROJECTS_U1  |     1 |       |     1 |
|  11 |    TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS     |     1 |    42 |     1 |
|  12 |     INDEX UNIQUE SCAN         | KDRV_PROJECTS_U1  |     1 |       |       |
-----------------------------------------------------------------------------------

Notice, how we are scanning KDRV_PROJECTS once for each OR-ed parts, eventually doing a full-table scan of KNTA_EXCEPTIONS (a table with more than 480K records).

Taking a close look at the SQL statement, you can see the TWO "OR" parts are exactly the same, except (a) the earlier block searches for PROJECT_TYPE_CODE ='TASK'; and (b) the later block searches for PROJECT_TYPE_CODE = 'PROJECT' *and* SOURCE_ENTITY_ID (a column in outer table KNTA_EXCEPTIONS) = 59.

This SQL can be written in a much simplified -



       SELECT *  FROM knta_exceptions ex
              WHERE source_id IN ( SELECT kp2.project_id
                                     FROM kdrv_projects kp2,
                                          kdrv_projects kp1
                                    WHERE kp1.project_id =:p_source_id
                                      AND kp2.project_path_id_list LIKE kp1.project_path_id_list||'%'
                                      AND (kp2.project_type_code = 'TASK' OR (kp2.project_type_code = 'PROJECT' AND source_entity_id = 59)))
-- just one OR condition that takes care of both.
                                      AND source_entity_id IN (59, 6)

For which the explain plan looks like -



-----------------------------------------------------------------------------------
| Id  | Operation                     |  Name             | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |  1794 |   348K|   639 |
|   1 |  HASH JOIN                    |                   |  1794 |   348K|   639 |
|   2 |   NESTED LOOPS                |                   | 14112 |  1074K|   385 |
|   3 |    TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS     |     1 |    36 |     2 |
|   4 |     INDEX UNIQUE SCAN         | KDRV_PROJECTS_U1  |     1 |       |     1 |
|   5 |    TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS     | 14112 |   578K|   383 |
|   6 |     INDEX RANGE SCAN          | KDRV_PROJECTS_N2  |  2540 |       |    16 |
|   7 |   TABLE ACCESS FULL           | KNTA_EXCEPTIONS   | 61507 |  7267K|   168 |
-----------------------------------------------------------------------------------

Notice the change in explain plan. Now we are doing a hash join between the KDRV_PROJECTS(s) and KNTA_EXCEPTIONS. KNTA_EXCEPTIONS being the "probe" table. This plan was obtained in 9i. The consistent gets is down to 1800!!

I tried the same SQL in 10g (10.1.0.3 with CPU COSTING off ) and got even better statistics (16 Logical I/O for same data) and a different plan as shown below -



-------------------------------------------------------------------------------------
| Id  | Operation                      | Name               | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                    |  9953 |  1846K|   146 |
|   1 |  TABLE ACCESS BY INDEX ROWID   | KNTA_EXCEPTIONS    |     1 |   112 |     1 |
|   2 |   NESTED LOOPS                 |                    |  9953 |  1846K|   146 |
|   3 |    NESTED LOOPS                |                    | 14112 |  1074K|     5 |
|   4 |     TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS      |     1 |    36 |     1 |
|   5 |      INDEX UNIQUE SCAN         | KDRV_PROJECTS_U1   |     1 |       |     1 |
|   6 |     TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS      | 14112 |   578K|     4 |
|   7 |      INDEX RANGE SCAN          | KDRV_PROJECTS_N2   |  2540 |       |     0 |
|   8 |    INLIST ITERATOR             |                    |       |       |       |
|   9 |     INDEX RANGE SCAN           | KNTA_EXCEPTIONS_N1 |     2 |       |     0 |
-------------------------------------------------------------------------------------

Notice, the difference in the plan. On 10g, a "nested loop" is chosen ahead of the "hash join". It also uses the index KNTA_EXCEPTIONS_N1 (on SOURCE_ID, SOURCE_ENTITY_ID in KNTA_EXCEPTIONS) for "inlist iterations".

So, assuming the customer changes this code, the number of buffer_gets per execution will come down from 175K to 1.8K -- possibly reducing the CPU_Time by at least 90%-95%. The total CPU Overhead in this 9i DB was 58%. After pushing the change, the CPU Overhead for this SQL should should come to around 5% or less (extrapolating the response time and LIO). Nice, ain't it?

posted by Nilendu # 7:11 PM

Comments: Post a Comment

<< Home

Oracle Performance and Engineering

Thursday, April 13, 2006

Query Tuned for complicated OR clause

Archives