Random notes on database features, code, practice, optimization and techniques.
Many times, when they come to me with a particular SQL tuning issue, developers are well armed with the "explain plan" information. These days it's really easy to see the "explain plan" output (the plan that optimizer is supposed to take to execute the statement) thanks to client-end defvelopment tools like TOAD etc. Actually you don't even have to execute the SQL statement to see the plan. The pain is, often these tools put a scary
red highlights on the likes of
full table scans,
cartesian joins etc etc.
So often developers come and the conversation that follows is like -
Developer : "Hey! The SQL for rolling-up cost on the dashboard is superslow. Ummmm..I guess it's doing a full table scan. Plus,
I guess, there's some kind of weird
cartesian join going on there."
(Why do they say "I guess" so many times anyways?)
Well, well! This is one episode of "Friends" that would be called "Where they knew why there was a problem but were wrong (and say similar things again in future!)".
During mid-90s people tuned by ratios. A ratio looked good and the system got a high grade of reliability. Imagine, if a pilot's blood alcohol test is ratio-based what will happen. Say, you legistate a rule that if the pilot's blood alcohol level is lower than 1/10th of the average of the passengers he could be allowed to fly. Yeah, right! Good luck to your honeymoon flight to Hawaii!! Ratios, IMHO, are best used in presentation to people who are not actually doing the work. You being a CEO presenting the business facts to wall street types should use ratios. But try telling your employees something like "the average pay hike last year was 11% more than the industry median..". Good luck again! Ratios are no-no for field.
Then came the "symptom based" tuning. Run a bunch of scripts, figure out waits, resource crunches and accordingly, add resources. Then buy expensive monitoring software. Design changes? Oh yeah! let's go J2EE "
based". The underlying core database code got left out of the whole thing. "Full table scans should be fixed" is one of the symptoms that's very high on this group's target list.
I have only one word to them. Evolve!
However, as the heading tells, EXPLAIN PLAN - specially those from
TOOLS could often be wrong. Not only they could be different from the real execution path of the statement but they could be different from execution plan derived from other tools (like, SQLPlus).
Don't believe me?
See this. Here explain plans differed between SQL*Plus and a monitoring tool because the tool was using character string by default for all bind variables!!
What I am trying to say is, in some cases, none of these could actually be the
real execution plan of the statement. The real execution plan of a statement can only be gotten from
tkprof-ing the trace file (from 9i onwards, the execution path is shown by default after
tkprof-ing a trace file). Note, that we have an "explain" option for tkprof -- and that's really the same as EXPLAIN PLAN in SQL*Plus, i.e., could really be misleading and different from the real one.
Here's what the
Oracle Manual says about Explain Plan statement -
"
With the query optimizer, execution plans can and do change as the underlying optimizer inputs change. EXPLAIN
PLAN
output shows how Oracle runs the SQL statement when the statement was explained. This can differ from the plan during actual execution for a SQL statement, because of differences in the execution environment and explain plan environment."
Three key phrases here -
(a) how Oracle runs (for clarity, should be "
would probably have run") the SQL
(b) This can differ from the plan during actual execution
(c) 'because of differences in the ..environments'
So what could change between the "real execution environment" and the "explain plan environment"?
(1) To start, let's say the 'sort area size' or 'hash area size'. After 9i, they are governed by 'workarea_size_policy'. If it's set to AUTO -- Oracle will allocate and de-allocate session memory for sorting, hashing etc to sessions dynamically. That will be a part of the afore-mentioned "environment" change.
(2) For example, a higher value of the parameter "optimizer_dynamic_sampling", for example, analyzes the structure dynamically in run-time. Thus the explain plan - which is derived without actually executing the SQL - could vary from the real plan in the trace file.
(3) The SQL could be aged out of the shared pool
(4) The object could have been re-analyzed, changed or another session or system level parameter that affectes execution path may have been changed in between, and many many other things.
So, EXPLAIN PLAN really is an indicator of how things are supposed to happen.
To see how things actually did happen one should always either
(a) Trace the SQL / Session and do a "tkprof" (without the EXPLAIN option)
or,
(b) (from Oracle 9i and above only), use the
V$SQL_PLAN data dictionary view, if the SQL is still in V$SQLAREA (Shared Pool).
In a nutshell, beware of just looking at the EXPLAIN PLAN and take an uninformed decision to fix your problem.