Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Tuesday, March 15, 2005

'Blind' Tuning Methods - A walkthrough

Is there any other job in the world where people just ignore you where you can really help with your knowledge and experience (design phase), but looks up with cute puss in boots type eyes to you to 'save the day' when those designs don't work and an irate customer is threatening to walk out??

Well - if you're the 'performance evangelist' -- that's an integral part of the job. It's fun too. Because once you actually make the application 'Die another day' - you're a hero. People actually don't want to ask "What are you working on these days" during lunch. But they are wide-eared even if you're talking about some silly, stupid ideas. You, only you probably, secretly sigh thinking how *easy* it could have been to prevent the issue at the first place. As Einstein said, "Intelligent people solve problems, Geniuses prevent them (from happening)". But then when was someone declared a hero for doing things as expected?

However, I had a hard call today with one of the "big five's" consultant DBA frantically asking for help after getting thrashed by the customer.

The issue :

(winnowed down from several emails) -

The on-site consultant sent me the following SQL with the plan output and the execution statistics. Apparently, "they found another query that is costing too much I/O for them that they are claiming" and " forgot to mentione the problem: This is another sql statement that is using a large amount of sort space and is currently executing. It is over 2 GB right now. I have included the explain and execution plans with statistics for one project id."

SELECT top_proj.project_id as project_id,
kpw.period_id as period_id,
round(sum(kpc.actual_labor_cost* ACCN_EV_ANALYSIS_UTILS.GET_TASK_RATIO(kpw.period_id,kp.project_id)),2) as actual_labor_cost
from ACCN_project_costs kpc,
ACCN_projects kp,
ACCN_projects top_proj,
ACCN_periods kpw
where top_proj.project_id = :b1
and kpc.project_id = kp.project_id
and kp.project_path_id_list like top_proj.project_path_id_list||'%'
and kp.project_type_code = 'TASK'
and kpw.period_type_id = 3
and kpw.period_id in (select period_id
from ACCN_cost_metrics_by_period where project_id = :b1)
group by kpw.period_id, top_proj.project_id

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 468 | 44460 | 5074 |
| 1 | SORT GROUP BY | | 468 | 44460 | 5074 |
| 2 | HASH JOIN | | 77743 | 7212K| 4429 |
| 3 | VIEW | index$_join$_004 | 784 | 6272 | 9 |
| 4 | HASH JOIN | | 77743 | 7212K| 4429 |
| 5 | INDEX RANGE SCAN | ACCN_PERIODS_N2 | 784 | 6272 | 3 |
| 6 | INDEX FAST FULL SCAN | ACCN_PERIODS_U1 | 784 | 6272 | 3 |
| 7 | MERGE JOIN | | 77743 | 6605K| 4418 |
| 8 | SORT JOIN | | 2961K| 141M| 3254 |
| 9 | MERGE JOIN CARTESIAN | | 2961K| 141M| 3254 |
| 10 | NESTED LOOPS | | 26 | 1040 | 4 |
| 11 | TABLE ACCESS BY INDEX ROWID| ACCN_PROJECTS | 1 | 31 | 2 |
| 12 | INDEX UNIQUE SCAN | ACCN_PROJECTS_U1 | 121K| | 1 |
| 13 | SORT UNIQUE | | | | |
| 14 | INDEX RANGE SCAN | ACCN_COST_METRICS_BY_PERIOD_U2 | 26 | 234 | 2 |
| 15 | BUFFER SORT | | 115K| 1129K| 3252 |
| 16 | TABLE ACCESS FULL | ACCN_PROJECT_COSTS | 115K| 1129K| 125 |
| 17 | FILTER | | | | |
| 18 | SORT JOIN | | | | |
| 19 | TABLE ACCESS FULL | ACCN_PROJECTS | 60725 | 2194K| 737 |
-----------------------------------------------------------------------------------------------------

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5074 Card=468 Bytes=44460)
1 0 SORT (GROUP BY) (Cost=5074 Card=468 Bytes=44460)
2 1 HASH JOIN (Cost=4429 Card=77743 Bytes=7385585)
3 2 VIEW OF 'index$_join$_004' (Cost=9 Card=784 Bytes=6272)
4 3 HASH JOIN (Cost=4429 Card=77743 Bytes=7385585)
5 4 INDEX (RANGE SCAN) OF 'ACCN_PERIODS_N2' (NON-UNIQUE) (Cost=3 Card=784 Bytes=6272)
6 4 INDEX (FAST FULL SCAN) OF 'ACCN_PERIODS_U1' (UNIQUE) (Cost=3 Card=784 Bytes=6272)
7 2 MERGE JOIN (Cost=4418 Card=77743 Bytes=6763641)
8 7 SORT (JOIN) (Cost=3254 Card=2961366 Bytes=148068300)
9 8 MERGE JOIN (CARTESIAN) (Cost=3254 Card=2961366 Bytes=148068300)
10 9 NESTED LOOPS (Cost=4 Card=26 Bytes=1040)
11 10 TABLE ACCESS (BY INDEX ROWID) OF 'ACCN_PROJECTS' (Cost=2 Card=1 Bytes=31)
12 11 INDEX (UNIQUE SCAN) OF 'ACCN_PROJECTS_U1' (UNIQUE) (Cost=1 Card=121450)
13 10 SORT (UNIQUE)
14 13 INDEX (RANGE SCAN) OF 'ACCN_COST_METRICS_BY_PERIOD_U2' (UNIQUE) (Cost=2 Card=26 Bytes=234)
15 9 BUFFER (SORT) (Cost=3252 Card=115656 Bytes=1156560)
16 15 TABLE ACCESS (FULL) OF 'ACCN_PROJECT_COSTS' (Cost=125 Card=115656 Bytes=1156560)
17 7 FILTER
18 17 SORT (JOIN)
19 18 TABLE ACCESS (FULL) OF 'ACCN_PROJECTS' (Cost=737 Card=60725 Bytes=2246825)

Statistics
----------------------------------------------------------
2245 recursive calls
43 db block gets
12336 consistent gets
47510 physical reads
0 redo size
1886 bytes sent via SQL*Net to client
688 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
17 sorts (memory)
1 sorts (disk)
54 rows processed

Analysis -

Looking at the plan and statistics, the 'bad' keyphrases in this case are -

(1) 47510 physical reads (almost 4 times the 'Consistent Gets' or Logical I/O)
(2) 2 1 HASH JOIN (Cost=4429 Card=77743 Bytes=7385585)
(3) 9 8 MERGE JOIN (CARTESIAN) (Cost=3254 Card=2961366 Bytes=148068300)

One cartesian join (i.e., no join condition found between the two data source) producing 148MB of data and one HASH join on this data will certainly use a lot of "sorting" area as they've noted.

Solution / Logical Analysis -

Let's get our critical thinking rolling. I did not have access to this particular database. So firstly,
I tried to see if there's actually a table that's not joined with other tables - i.e., candidate for MERGE JOIN (CARTESIAN).

The query contains four tables and if you compare the joins you'll find that the table ACCN_PERIODS is not directly joined with other three tables -- though ACCN_PERIODS is supposed to be 'qualified' (joined well) since the IN-clause returns 'just a few rows' (got that information from the consultant).

(Probably?) They tried to print a report for some project for *each* period.

The other tables are, in fact, very well joined. In fact, ACCN_PROJECTS (TOP_PROJ) is supplied a value for its primary key - which then should join well with the other ACCN_PROJECTS (KP). KP and KPC join with PROJECT_ID which is primary key on KP and an index on KPC.

So - this is how I think these issues - * if I were the optimizer * -- I would do

(1) find row from TOP_PROJ (PRIMARY KEY Access by INDEX ROWID)

(2) join the row with KP (*probably* Nested LOOP - since the #rows from 1 is 1)

(3) assuming a few rows from KP returned, NESTED LOOP with KPC (a primary key - index join)

(4) find out the qualified rows from KPW by comparing it with the IN-list

(5) just a few rows from the datasource returned by #3 - doing a CARTESIAN JOIN (CROSS JOIN X * Y) with datasource returned by #4

Thus, CARTESIAN join is unavoidable -- but in the 'if I were optimizer' method - we make sure cartesian join takes place between two smallest data source possible. Only after all other filters and predicates have been applied to them.

What's wrong in reality -

Well, look at the plan. The optimizer somehow does confuse. Probably the IN-list is 'merged' with the main query block. So now, the query has Five tables to deal with. So the real plan by optimizer is like -

(1) TOP_PROJ access -- INDEX UNIQUE SCAN since it's a primary key

(2) ACCN_cost_metrics_by_period access - based on project_id

(3) Optimizer sees both tables TOP_PROJ and ACCN_COST_ using PROJECT_ID = :b1 join -- so it ends up joining those two instead.

(So, does it do the same only during a CARTESIAN join? I believe not. Optimizer tries to put as many joins as possible all the time)

(4) Dataset returned by #3 (26 rows) is then CARTESIAN joined with KPC (115K rows)

CARD of step 8 (7) should then be 115656 * 26 = 3007056. In reality, it's 2961366.

There goes the last hope for our query!

(5) Dataset returned by #4 is then joined with KP. But it's a MERGE JOIN - since the number of records returned by #4 is too high.

..and so on..

Fix -

So if we can convince Oracle to follow our path as mentioned in solution - it would not join TOP_PROJ and ACCN_cost_metrics_by_period using PROJECT_ID (this exactly screws up the plan!).

So I rewrote the query and added the ORDERED hint so that TOP_PROJ is earlier joined with other tables and ACCN_COST_...table is used only to join KPW.

The modified query looked like -

SELECT /*+ ORDERED */ top_proj.project_id as project_id,
kpw.period_id as period_id,
round(sum(kpc.actual_labor_cost* ACCN_EV_ANALYSIS_UTILS.GET_TASK_RATIO(kpw.period_id,kp.project_id)),2) as actual_labor_cost
from ACCN_projects top_proj,
ACCN_projects kp,
ACCN_project_costs kpc,
ACCN_periods kpw
-- note the ORDER
where top_proj.project_id = :b1
and kpc.project_id = kp.project_id
and kp.project_path_id_list like top_proj.project_path_id_list||'%'
and kp.project_type_code = 'TASK'
and kpw.period_type_id = 3
and kpw.period_id in (select period_id
from ACCN_cost_metrics_by_period where project_id = :b1)
group by kpw.period_id, top_proj.project_id;

The plan then looked like -

# Operation Options Object name
======== ================ ============== ==============================
0 SELECT STATEMENT
1 SORT GROUP BY
2 NESTED LOOPS
3 MERGE JOIN CARTESIAN
4 NESTED LOOPS
5 NESTED LOOPS
6 TABLE ACCESS BY INDEX ROWID KDRV_PROJECTS
7 INDEX UNIQUE SCAN KDRV_PROJECTS_U1
8 SORT UNIQUE
9 INDEX RANGE SCAN KDRV_COST_METRICS_BY_PERIOD_U2
10 TABLE ACCESS BY INDEX ROWID KNTA_PERIODS
11 INDEX UNIQUE SCAN KNTA_PERIODS_U1
12 BUFFER SORT
13 TABLE ACCESS FULL KDRV_PROJECT_COSTS
14 TABLE ACCESS BY INDEX ROWID KDRV_PROJECTS
15 INDEX UNIQUE SCAN KDRV_PROJECTS_U1

Look, the way the joins are "as intended" now. The query doesn't use any more MERGE or HASH joins - and executed well under 1 sec with less than 4000 Consistent Gets.

In future, I intend to discuss this optimizer "unnesting" part more with some more real life examples. Remind me :)

posted by Nilendu # 4:57 PM

Comments:

Well done Nilendu - Takla

# posted by

Anonymous : 8:20 AM

Oracle Performance and Engineering

Tuesday, March 15, 2005

'Blind' Tuning Methods - A walkthrough

Archives