Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Thursday, October 12, 2006

 

Fast == Not repeating yourself

As someone said, the fastest way to do something is to not do it. Nothing can beat zero!

Take the following piece of code -


SELECT p.project_id project_id,
NVL(SUM(NVL (tsld.actual_time, 0)), 0) actuals_to_date,
NVL(SUM(NVL (tsld.actual_time, 0) * NVL (tsl.actual_rate, 0) / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)),0) cost_to_date_bse,
NVL(SUM(NVL (tsld.actual_time, 0) * NVL (tsl.actual_rate, 0) * DECODE((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL),
(SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL), 1,
((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
/ (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))),0
) cost_to_date_lcl
FROM kdrv_projects p,
kdrv_projects mp,
kdrv_project_setups ps,
knta_regions r,
ktmg_time_sheet_lines tsl,
ktmg_time_sheets tts,
ktmg_time_sheet_line_details tsld
WHERE tts.status_code != 'CANCELLED'
AND tsl.time_sheet_id = tts.time_sheet_id
AND ( tsl.work_item_type_code = 'TASK'
OR tsl.work_item_type_code = 'PROJECT'
)
AND tsld.time_sheet_line_id = tsl.time_sheet_line_id
AND tsld.activity_id = -1
AND (p.project_id IN
(397198, 68164, 351378, 280015, 281991, 68194)
)
AND tsl.work_item_id = p.project_id
AND tsl.work_item_set_id = p.master_project_id
AND mp.project_id = p.master_project_id
AND mp.project_id = ps.source_id
AND ps.source_entity_id = 6
AND ps.region_id = r.region_id
GROUP BY p.project_id


I was asked by one of our customers if adding an ORDERED hint is OK. It seems they've gotten the performance of the above SQL down to 4.5 sec by adding the hint. No logic was given on why adding the ORDERED hint would make it fast. Or, better - what's actually wrong with the above query. I took a look and found ( I needed to take a look at the data model as well)-

(1) Information - The only attribute that needs to be SUMmed over PROJECT_ID is TSLD.ACTUAL_TIME. ACTUAL_RATE remains constant throughout a project.

How it can help performance - We can remove the GROUP BY. We can just use SUM() analytical function on TSLD.ACTUAL_RATE over (partition by project_id)

(2) What's bad - SUM(NVL(column,0)) is redundant! Whenever SUMming -- all NULLs are ignored (i.e., treated as zero) anyways.

(3) Information - The PL/SQL function kcst_cost_util.get_currency_exchange_rate is called too many times. Let's see the logic behind -


NVL(SUM(NVL (tsld.actual_time, 0) * NVL (tsl.actual_rate, 0) * DECODE((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL),
(SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL), 1,
((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
/ (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))),0
) cost_to_date_lcl





if
(SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) = (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)
then nvl((SUM(tsld.actual_time) * tsl.actual_rate),0)
else
NVL( SUM(tsld.actual_time) * tsl.actual_rate * (SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
/ (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)), 0)
end if;


Now, isn't (for most of the cases -- there're exceptions though!) --


if F(x) = F(y) then
x = y;
end if;


Thus we can simplify the part from the SQL above from -



DECODE((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL),
(SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL), 1,
((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
/ (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))




To -

(I checked that the function will return the same value for same variable. i.e., deterministic. In other words, I would not be able to transform the following without a look at the source code.


DECODE(r.currency_id,tsl.currency_id, 1,
((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
/ (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))



The final optimized version of the query looked like (I actually did not check in the immediate above transformation since the gain was minute, but the logic still holds) --




SELECT project_id,
NVL(actd, 0) actuals_to_date,
NVL((actd * actual_rate/ tslc),0) cost_to_date_bse,
NVL((actd * actual_rate * DECODE(rcri, tslc, 1,(rcri / tslc))),0) cost_to_date_lcl
FROM
(SELECT p.project_id, SUM(tsld.actual_time) over (PARTITION BY p.project_id) actd , NVL(tsl.actual_rate, 0) actual_rate,
(SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL) tslc,
(SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL) rcri
FROM kdrv_projects p,
kdrv_projects mp,
kdrv_project_setups ps,
knta_regions r,
ktmg_time_sheet_lines tsl,
ktmg_time_sheets tts,
ktmg_time_sheet_line_details tsld
WHERE tts.status_code != 'CANCELLED'
AND tsl.time_sheet_id = tts.time_sheet_id
AND ( tsl.work_item_type_code = 'TASK'
OR tsl.work_item_type_code = 'PROJECT'
)
AND tsld.time_sheet_line_id = tsl.time_sheet_line_id
AND tsld.activity_id = -1
AND (p.project_id IN
(397198, 68164, 351378, 280015, 281991, 68194)
)
AND tsl.work_item_id = p.project_id
AND tsl.work_item_set_id = p.master_project_id
AND mp.project_id = p.master_project_id
AND mp.project_id = ps.source_id
AND ps.source_entity_id = 6
AND ps.region_id = r.region_id)



The improvements -
(1) I evaluated the PL/SQL function values in the sub-query itself.
(2) I removed the group by and used Oracle Analytic Function SUM() instead (Great when you really are grouping by just a few of many columns present in the SELECT clause).
(3) I removed the redundant (sum(nvl(column,0))) to sum(column)


Final version of the query ran, as expected, beating the hinted version by a whole 2 seconds!

Comments:
This comment has been removed by the author.
 
Hi Nilendu,
This is great stuff for someone who is looking to optimize performance for their ITG implementation.
We have a client who has serious issue with performance with their ITG implementation as they have global users and are looking for some way to optimize access to users. Wanted to check if materialized views can be used to have data at local offices so that atleast portlets access will be local rather than over the network. (I am not a Oracle expert and I am working with the client to check if this is a possibility). If you have any idea on what could help them do let me know.

- pradeepkmrm@gmail.com
 
Post a Comment



<< Home

Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?