Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Wednesday, February 02, 2005

 

DYNAMIC SQL - don't just use it because it's there!!

The following write-up shows how a "high level" problem was broken down and resolved in step-by-step logical improvements.

The Problem -

Today I finally got to investigate a "critical performance issue". Functionally, there was some sort of "clean-up" operation running every 20 minutes and - depending on data volume - was taking more than 20 minutes to run. Thus the next runs were always in queue causing a perpetually spiked CPU.

Analysis -

I won't go into much detail now - tired after a long day - but to diagnose I ran the particular PLSQL code in (a) SQL Trace mode and (b) PLSQL Profiled mode (my favourite for PLSQL -- DBMS_PROFILER rocks. Wish there were a similar tool which would granularize timing data per line (or step) of a SQL query. Agreed it's a tough job!).

From trace data the total time (with our small data subset) for the batch job was 486 seconds - doing 16M IOs. Now, just doing the 16M IOs in our system would take somewhere like 45 seconds flat! So, more work was done in "context switches" - and probably being wasted.

The PLSQL Profile in HTML format (Oracle gives a fantastic script to HTML format PLSQL profile data - proftab.sql) showed more than 90% of time was spent in 4 places. Plus, there was a lot of latch related waits.

Main Culprit? -

When I was scanning the TKPROF output file, I saw a *lot* statements each like --

delete from exceptions ex
where source_id = 12234
and source_entity_id = 6
and last_update_date > to_date(''' || to_char(p_last_update_date, 'YYYY-MM-DD HH24:MI:SS') || ''', ''YYYY-MM-DD HH24:MI:SS'')';

Each of these statements was doing around 10,000 Consistent Gets (a bit too much for EXCEPTIONS table - which has a concatenated index on SOURCE_ID and SOURCE_ENTITY_ID. Trace plan showed the index was NOT being used!

Source of the problem - Where from? -

A quick glance at the database source

SELECT * FROM USER_SOURCE WHERE UPPER(TEXT) LIKE '%DELETE%EXCEPTIONS EX%';

gave the PLSQL package body name where the statement was coming from (the whole thing was a PLSQL batch job - so I just checked the database stored procs)

And boom!

The package body had -

--- code snippet begins

l_sql_stmt :=
'DELETE FROM exceptions ' ||
' WHERE source_id = ' || p_source_id ||
' AND source_entity_id = ' || p_source_entity_id ||
' AND last_update_date >to_date(''' || to_char(p_last_update_date, 'YYYY-MM-DD HH24:MI:SS') || ''', ''YYYY-MM-DD HH24:MI:SS'')';

l_cursor_id := dbms_sql.open_cursor;
dbms_sql.parse(l_cursor_id, l_sql_stmt, dbms_sql.v7);
l_row_count := dbms_sql.execute(l_cursor_id);
dbms_sql.close_cursor(l_cursor_id);

--- code snippet ends

Why Bad? -

One of the WORST PLSQL coding I've ever seen in my life. And yes, I am not at all exaggerating.

The developer did use "DBMS_SQL" (Dynamic SQL) and thought it's "cool". But well, why do we really need a Dynamic SQL here? We are just passing the parameters (or, variables!) to the SQL.

The statement not only will prevent binding (they not only used dynamic SQL on a totally wrong context, but they also made sure the statement does not use bind variable by not using DBMS_SQL.BIND_VARIABLES procedure) - thus causing multiple occurrances with hard-coded literals in the trace file, but also (probably because of a dynamic comparison of numeric source_id to a 'string' (source_id = ' || p_source_id ||') the index on source_id was not being used!

Here's an example to show why "identifiers" (or, real variables) shall NEVER be bound.

This also explained the huge wait on latches in the trace file since we were not using bind variables in the most repeating SQL in the batch process.

To me, this is a criminal offence. Or well, almost.

Fix -

Just recompiling the package commenting the code snippet anove and replacing it with -

-- new code snippet starts

delete /*+ index(ex exceptions_n1) */ from exceptions ex
where source_id = p_source_id
and source_entity_id = p_source_entity_id
and last_update_date > to_date(''' || to_char(p_last_update_date, 'YYYY-MM-DD HH24:MI:SS') || ''', ''YYYY-MM-DD HH24:MI:SS'')';


-- new code snippet ends

This is a STATIC sql. This will make sure that everytime this delete is issued, it uses the same plan (static sql uses bind variables).

Guess how much the time was after just this basic, silly change? It's down to 267 seconds - from 486 seconds.

There's always more to do -

There were more changes made. As you could imagine, when there are silly errors like above in the code - there were more 'deeper' issues (specially with bulk operations and all). After fixing those (will take more time to explain - used the new FORALL, BULK COLLECT rather than populating the PLSQL tables through for-iterations, using CASE to compress several SQLs into one etc etc) the whole batch job was finishing in less than 100 seconds. I am still waiting to hear from the functional guy - there's a row-level trigger on the table just for the batch procedure, and I want to rather do the trigger's work after the whole batch job is done in one shot. After implementing that change, it should really take around 75 seconds. A 650% improvement of the most annoying code within a few hours - not too bad.



Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?