Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Thursday, October 12, 2006

Fast == Not repeating yourself

As someone said, the fastest way to do something is to not do it. Nothing can beat zero!

Take the following piece of code -


SELECT   p.project_id project_id,
       NVL(SUM(NVL (tsld.actual_time, 0)), 0) actuals_to_date,
       NVL(SUM(NVL (tsld.actual_time, 0) * NVL (tsl.actual_rate, 0) / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)),0) cost_to_date_bse,
       NVL(SUM(NVL (tsld.actual_time, 0) * NVL (tsl.actual_rate, 0) * DECODE((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL),
                                                                                  (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL), 1,
                                                                             ((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
                                                                            / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))),0
                                                                              ) cost_to_date_lcl
             FROM kdrv_projects p,
                  kdrv_projects mp,
                  kdrv_project_setups ps,
                  knta_regions r,
                  ktmg_time_sheet_lines tsl,
                  ktmg_time_sheets tts,
                  ktmg_time_sheet_line_details tsld
            WHERE tts.status_code != 'CANCELLED'
              AND tsl.time_sheet_id = tts.time_sheet_id
              AND (   tsl.work_item_type_code = 'TASK'
                   OR tsl.work_item_type_code = 'PROJECT'
                  )
              AND tsld.time_sheet_line_id = tsl.time_sheet_line_id
              AND tsld.activity_id = -1
              AND (p.project_id IN
                                (397198, 68164, 351378, 280015, 281991, 68194)
                  )
              AND tsl.work_item_id = p.project_id
              AND tsl.work_item_set_id = p.master_project_id
              AND mp.project_id = p.master_project_id
              AND mp.project_id = ps.source_id
              AND ps.source_entity_id = 6
              AND ps.region_id = r.region_id
         GROUP BY p.project_id

I was asked by one of our customers if adding an ORDERED hint is OK. It seems they've gotten the performance of the above SQL down to 4.5 sec by adding the hint. No logic was given on why adding the ORDERED hint would make it fast. Or, better - what's actually wrong with the above query. I took a look and found ( I needed to take a look at the data model as well)-

(1) Information - The only attribute that needs to be SUMmed over PROJECT_ID is TSLD.ACTUAL_TIME. ACTUAL_RATE remains constant throughout a project.

How it can help performance - We can remove the GROUP BY. We can just use SUM() analytical function on TSLD.ACTUAL_RATE over (partition by project_id)

(2) What's bad - SUM(NVL(column,0)) is redundant! Whenever SUMming -- all NULLs are ignored (i.e., treated as zero) anyways.

(3) Information - The PL/SQL function kcst_cost_util.get_currency_exchange_rate is called too many times. Let's see the logic behind -


NVL(SUM(NVL (tsld.actual_time, 0) * NVL (tsl.actual_rate, 0) * DECODE((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL),
                                                                                  (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL), 1,
                                                                             ((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
                                                                            / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))),0
                                                                              ) cost_to_date_lcl



if
(SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) = (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)
then                   nvl((SUM(tsld.actual_time) * tsl.actual_rate),0)
else
        NVL( SUM(tsld.actual_time) * tsl.actual_rate * (SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
                                                                            / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)), 0)
end if;

Now, isn't (for most of the cases -- there're exceptions though!) --


if F(x) = F(y)  then
x = y;
end if;

Thus we can simplify the part from the SQL above from -



DECODE((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL),
                                                                                  (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL), 1,
                                                                             ((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
                                                                            / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))

To -

(I checked that the function will return the same value for same variable. i.e., deterministic. In other words, I would not be able to transform the following without a look at the source code.



                                      DECODE(r.currency_id,tsl.currency_id, 1,
                                 ((SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL)
                                / (SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL)))

The final optimized version of the query looked like (I actually did not check in the immediate above transformation since the gain was minute, but the logic still holds) --



SELECT     project_id,
          NVL(actd, 0) actuals_to_date,
          NVL((actd * actual_rate/ tslc),0) cost_to_date_bse,
          NVL((actd * actual_rate * DECODE(rcri, tslc, 1,(rcri / tslc))),0) cost_to_date_lcl
FROM
(SELECT p.project_id, SUM(tsld.actual_time) over (PARTITION BY p.project_id) actd , NVL(tsl.actual_rate, 0) actual_rate,
(SELECT kcst_cost_util.get_currency_exchange_rate(tsl.currency_id,p.scheduled_start_date) FROM DUAL) tslc,
(SELECT kcst_cost_util.get_currency_exchange_rate(r.currency_id,p.scheduled_start_date) FROM DUAL) rcri                                  
             FROM kdrv_projects p,
                  kdrv_projects mp,
                  kdrv_project_setups ps,
                  knta_regions r,
                  ktmg_time_sheet_lines tsl,
                  ktmg_time_sheets tts,
                  ktmg_time_sheet_line_details tsld
            WHERE tts.status_code != 'CANCELLED'
              AND tsl.time_sheet_id = tts.time_sheet_id
              AND (   tsl.work_item_type_code = 'TASK'
                   OR tsl.work_item_type_code = 'PROJECT'
                  )
              AND tsld.time_sheet_line_id = tsl.time_sheet_line_id
              AND tsld.activity_id = -1
              AND (p.project_id IN
                                (397198, 68164, 351378, 280015, 281991, 68194)
                  )
              AND tsl.work_item_id = p.project_id
              AND tsl.work_item_set_id = p.master_project_id
              AND mp.project_id = p.master_project_id
              AND mp.project_id = ps.source_id
              AND ps.source_entity_id = 6
              AND ps.region_id = r.region_id)

The improvements -
(1) I evaluated the PL/SQL function values in the sub-query itself.
(2) I removed the group by and used Oracle Analytic Function SUM() instead (Great when you really are grouping by just a few of many columns present in the SELECT clause).
(3) I removed the redundant (sum(nvl(column,0))) to sum(column)

Final version of the query ran, as expected, beating the hinted version by a whole 2 seconds!

posted by Nilendu # 2:17 PM

Comments:

This comment has been removed by the author.

# posted by

Pradeep : 9:18 AM

Hi Nilendu,
This is great stuff for someone who is looking to optimize performance for their ITG implementation.
We have a client who has serious issue with performance with their ITG implementation as they have global users and are looking for some way to optimize access to users. Wanted to check if materialized views can be used to have data at local offices so that atleast portlets access will be local rather than over the network. (I am not a Oracle expert and I am working with the client to check if this is a possibility). If you have any idea on what could help them do let me know.

- pradeepkmrm@gmail.com

# posted by

Anonymous : 9:21 AM

Oracle Performance and Engineering

Thursday, October 12, 2006

Fast == Not repeating yourself

Archives