Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Thursday, April 13, 2006

 

Query Tuned for complicated OR clause

Investigating a certain Statspack today I noticed that 58% of the DB CPU time is spent in a SQL as follows -


CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s)
--------------- ------------ -------------- ------ -------- ---------

478,456,910 3,804 125,777.3 58.5 2334.82 2302.48





SELECT *
FROM knta_exceptions ex
WHERE (source_id IN (
SELECT kp2.project_id
FROM kdrv_projects kp2, kdrv_projects kp1
WHERE kp1.project_id = :p_source_id
AND kp2.project_path_id_list LIKE kp1.project_path_id_list || '%'
AND kp2.project_type_code = 'TASK')
OR
(source_id IN (
SELECT kp2.project_id
FROM kdrv_projects kp2, kdrv_projects kp1
WHERE kp1.project_id = :p_source_id
AND kp2.project_path_id_list LIKE kp1.project_path_id_list || '%'
AND kp2.project_type_code = 'PROJECT')
AND source_entity_id = 59))
AND source_entity_id IN (59, 6);



This SQL was doing 125,777 Logical I/O (buffer gets) per execution.

Explain plan in 9i looks like -



-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4049 | 478K| 168 |
| 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL | KNTA_EXCEPTIONS | 4049 | 478K| 168 |
| 3 | NESTED LOOPS | | 1 | 78 | 3 |
| 4 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 1 | 36 | 2 |
| 5 | INDEX UNIQUE SCAN | KDRV_PROJECTS_U1 | 1 | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 1 | 42 | 1 |
| 7 | INDEX UNIQUE SCAN | KDRV_PROJECTS_U1 | 1 | | |
| 8 | NESTED LOOPS | | 1 | 78 | 3 |
| 9 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 1 | 36 | 2 |
| 10 | INDEX UNIQUE SCAN | KDRV_PROJECTS_U1 | 1 | | 1 |
| 11 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 1 | 42 | 1 |
| 12 | INDEX UNIQUE SCAN | KDRV_PROJECTS_U1 | 1 | | |
-----------------------------------------------------------------------------------



Notice, how we are scanning KDRV_PROJECTS once for each OR-ed parts, eventually doing a full-table scan of KNTA_EXCEPTIONS (a table with more than 480K records).

Taking a close look at the SQL statement, you can see the TWO "OR" parts are exactly the same, except (a) the earlier block searches for PROJECT_TYPE_CODE ='TASK'; and (b) the later block searches for PROJECT_TYPE_CODE = 'PROJECT' *and* SOURCE_ENTITY_ID (a column in outer table KNTA_EXCEPTIONS) = 59.

This SQL can be written in a much simplified -



SELECT * FROM knta_exceptions ex
WHERE source_id IN ( SELECT kp2.project_id
FROM kdrv_projects kp2,
kdrv_projects kp1
WHERE kp1.project_id =:p_source_id
AND kp2.project_path_id_list LIKE kp1.project_path_id_list||'%'
AND (kp2.project_type_code = 'TASK' OR (kp2.project_type_code = 'PROJECT' AND source_entity_id = 59)))
-- just one OR condition that takes care of both.
AND source_entity_id IN (59, 6)



For which the explain plan looks like -



-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1794 | 348K| 639 |
| 1 | HASH JOIN | | 1794 | 348K| 639 |
| 2 | NESTED LOOPS | | 14112 | 1074K| 385 |
| 3 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 1 | 36 | 2 |
| 4 | INDEX UNIQUE SCAN | KDRV_PROJECTS_U1 | 1 | | 1 |
| 5 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 14112 | 578K| 383 |
| 6 | INDEX RANGE SCAN | KDRV_PROJECTS_N2 | 2540 | | 16 |
| 7 | TABLE ACCESS FULL | KNTA_EXCEPTIONS | 61507 | 7267K| 168 |
-----------------------------------------------------------------------------------



Notice the change in explain plan. Now we are doing a hash join between the KDRV_PROJECTS(s) and KNTA_EXCEPTIONS. KNTA_EXCEPTIONS being the "probe" table. This plan was obtained in 9i. The consistent gets is down to 1800!!

I tried the same SQL in 10g (10.1.0.3 with CPU COSTING off ) and got even better statistics (16 Logical I/O for same data) and a different plan as shown below -



-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9953 | 1846K| 146 |
| 1 | TABLE ACCESS BY INDEX ROWID | KNTA_EXCEPTIONS | 1 | 112 | 1 |
| 2 | NESTED LOOPS | | 9953 | 1846K| 146 |
| 3 | NESTED LOOPS | | 14112 | 1074K| 5 |
| 4 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 1 | 36 | 1 |
| 5 | INDEX UNIQUE SCAN | KDRV_PROJECTS_U1 | 1 | | 1 |
| 6 | TABLE ACCESS BY INDEX ROWID| KDRV_PROJECTS | 14112 | 578K| 4 |
| 7 | INDEX RANGE SCAN | KDRV_PROJECTS_N2 | 2540 | | 0 |
| 8 | INLIST ITERATOR | | | | |
| 9 | INDEX RANGE SCAN | KNTA_EXCEPTIONS_N1 | 2 | | 0 |
-------------------------------------------------------------------------------------



Notice, the difference in the plan. On 10g, a "nested loop" is chosen ahead of the "hash join". It also uses the index KNTA_EXCEPTIONS_N1 (on SOURCE_ID, SOURCE_ENTITY_ID in KNTA_EXCEPTIONS) for "inlist iterations".

So, assuming the customer changes this code, the number of buffer_gets per execution will come down from 175K to 1.8K -- possibly reducing the CPU_Time by at least 90%-95%. The total CPU Overhead in this 9i DB was 58%. After pushing the change, the CPU Overhead for this SQL should should come to around 5% or less (extrapolating the response time and LIO). Nice, ain't it?

Comments: Post a Comment



<< Home

Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?