Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Monday, January 31, 2005

A Case of Deleting Duplicate Rows - MAX or MIN

Recently, there was this interesting case where DELETING DUPLICATE ROWS from a large history table (10m+ rows) in batch of 1000 was super slow.

Here's how the trace output looked like (when it was slow) --

DELETE FROM E1
WHERE E1.ROWID != (SELECT MIN(E3.ROWID) FROM E1 E3
WHERE E1.SOURCE_ID = E3.SOURCE_ID AND E1.SOURCE_ENTITY_ID = E3.SOURCE_ENTITY_ID)
AND ROWNUM < 1001

8.06 seconds

Statistics
----------------------------------------------------------
0 recursive calls
7082 db block gets
1724270 consistent gets
4947 physical reads
666564 redo size

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 9.35 10.22 4929 1724270 7082 1000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 9.35 10.23 4929 1724270 7082 1000

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE
1000 COUNT STOPKEY
1000 FILTER
855489 TABLE ACCESS FULL E1
855489 SORT AGGREGATE
1219705 INDEX RANGE SCAN E1_N1 (object id 56159)

After changing MIN to MAX the query was doing much less work (and thus much faster). Statistics below -

DELETE FROM E1
WHERE E1.ROWID != (SELECT MAX(E3.ROWID) FROM E1 E3
WHERE E1.SOURCE_ID = E3.SOURCE_ID AND E1.SOURCE_ENTITY_ID = E3.SOURCE_ENTITY_ID)
AND ROWNUM <>

- 02.44 seconds

Statistics
----------------------------------------------------------
0 recursive calls
7101 db block gets
496066 consistent gets
1417 physical reads
671584 redo size

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 2.45 2.75 1389 496066 7085 1000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 2.45 2.75 1389 496066 7085 1000

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE
1000 COUNT STOPKEY
1000 FILTER
246650 TABLE ACCESS FULL E1
246650 SORT AGGREGATE
248221 INDEX RANGE SCAN E1_N1 (object id 56159)

So, why just changing MIN to MAX changed the number of blocks read from 1.7 million to 0.5 million?

My explanation - the difference in work done (and thus, time) is fully obtained in the first step of the execution i.e., " INDEX RANGE SCAN E1_N1 (object id 56159)" returns 248K rows to the next level when we use MAX whereas the same " INDEX RANGE SCAN E1_N1 (object id 56159)" returns 1.2 million rows when we use MIN.

SORT AGGREGATE can be defined as (from Oracle Docs.) "Retrieval of a single row that is the result of applying a group function to a group of selected rows".

Here the "group function" is MAX or MIN.

E1_N1 is a non-unique index on (SOURCE_ENTITY_ID, SOURCE_ID).
Stats on this index -

BLevel = 2
Leaf Blocks = 69012
Distinct Keys = 6.4 million
Avg. Leaf Blocks per key = 1
Avg. Data Blocks per key = 3 (means, a key in an index block may have been spread over 3 data blocks in table)
Clustering Factor = 23M
Num Rows = 23 Million
Size = 545MB

A little theory -

The structure of a numeric B-tree index somewhat looks like -

						      +---------------+

                            |  50 and less--|--------+

                     +------|--more than 50 |        |

                     |      +---------------+        |

                     |                               |

                     |                               |

                     |                               |

              +---------------+               +---------------+

        +-----|-->= 100       |               |  >40 to 50    |

        |     |  >90 to 100---|---+           |  >30 to 40    |

        |     |  >80 to 90    |   |       +---|-->20 to 30    |

        |     |  >70 to 80    |   |       |   |  >10 to 20----|---+

        |     |  >60 to 70    |   |       |   |  >0  to 10    |   |

        |     |  >50 to 60    |   |       |   |  less than 0  |   |

        |     +---------------+   |       |   +---------------+   |

        |                         |       |                       |

        |                         |       |                       |

        |                         |       |                       |

        |                         |       |                       |

+---------------+   +---------------+   +---------------+   +---------------+

| 101, rowid    |   | 91, rowid     |   | 21, rowid     |   | 11, rowid     |

| 102, rowid    |   | 92, rowid     |   | 22, rowid     |   | 12, rowid     |

| 103, rowid    |<->| 93, rowid     |<->| 23, rowid     |<->| 13, rowid     |     

| 104, rowid    |   | 94, rowid     |   | 24, rowid     |   | 14, rowid     |

| ....          |   | ....          |   | ....          |   | ....          |

+---------------+   +---------------+   +---------------+   +---------------+

The lowest level blocks are "leaf" blocks. The second level blocks are "branch" blocks.

As we know, an index (B-tree index or "normal" index) "leaf" block entry (where the actual keys are stored) contains a row with two columns. One, the actual value of the index (in this case concatenated value for SOURCE_ENTITY_ID, SOURCE_ID) and, two, the physical address of the data row (ROWID) that contains the key. In more technical terms, leaf blocks are "doubly linked list".

Each "branch" (the higher level blocks than "leaf") index block exntry contains a maximum key value and a data block address (not a ROWID, but just BLOCKID).

This particular combination of two columns, as we see from INDEX STATS, has a lot of repeating values. Only about 6 million distinct values out of a 23 million total rows. So, when we're using MAX - there's a high chance that Oracle can navigate more number of "branch" blocks and not as many leaf blocks since it can get the MAX value and the range of the values in the leaf blocks under. Thus if a leaf block has 10 repeating values, MAX will get the MAX value of the range directly from the branch block, without even going to the leaf.

Due to the particular data skewness, this will result a fairly lower number of IOs. Very roughly, about 3.6 times less (23m total rows / 6.3 distinct rows). In reality, from TKPROF output we see - 3.47 (1724K "min" IOs/ 396K "max" IOs) times less Logical IO!!

However, this was a very random "test" data generated to try out performance issues. On the production instance of the database I was analyzing this on, both MIN and MAX queries followed not only the same plan - but, smartly enough, similar amount of IO.

posted by Nilendu # 2:04 PM 0 comments

Friday, January 28, 2005

Higher Level Performance Observation in 'on-demand' application

Today, in spite of having a roaring quarter Siebel took a nose-dive to close at $8.68. Noticeable stuff is Salesforce.com -its up and coming rival in the "on-demand" market - also lost one percent and change on the same day. From 2000 for about four years I was working in an on-demand software startup in Bay Area. I've had some personal observation to share regarding the performance and therefore, availability of this type.

As far as I remember, Larry Ellison probably was the first big guy in the industry to start talking about "Utility Computing". Utility computing was easy to understand - software will be provided like an utility. Just like you pay the bills to the local electric company for the Watts you use - you'd pay for the software without having to care for development, upgrade, or maintenance. The logic was (and, is) catchy. No one builds a powerplant at home to get electricity. Neither should one have a full-fledged development and maintenance for software if that's not the core business. "Utility Computing" was preceeded by "ASP" or, "Application Service Providers". Since then we really mention "hosted" (if engineers are talking among themselves); or "On-demand" if we're talking to potential customers (because IBM marketing demanded we do!).

Salesforce.com went one full step ahead and put "No Software" as its unilateral mantra, sometimes bringing attention to the fact that Dalai Lama doesn't need a software and, neither should we lesser mortals. In fact their US sales phone number is 1-800-NO-SOFTWARE.

Coming back to performance - I believe in the following characteristics regarding application performance -

'Fast' or 'slow' is defined by the users, not by performance 'analysts' just the same way a stock price is dictated by the buyers and not by the 'analysts'.
The 'best' mathematical way of achieving 'fastness' is not doing it. Hey, nothing can beat 0 seconds! (well, till you...)
Corrolary ONE: Often 'not doing it' is the best business solution as well.
Corrolary TWO: The best solution is often unnoticed in production since it never manifests as a problem. Please remember this when you review the performance engineer.
When there's one performance issue, there are usually more. That's exactly the reason I didn't let my dad read my grade reports on a day he's tired or upset. People who believe in tracking and killing the "root of all evil" are perversely optimistic.

The whole 'on-demand' thing, IMHO, is just another delivery model. You eat pizza outside or order a delivery? Eventually it's really your choice. But remember for business they have to be extra careful when they home-deliver. If the pizza-guy is late, you have a right to not pay! I guess this is exactly where the 'on-demand' providers should be really careful.

Usually the characteristic #3 (see above) is what differentiates the 'on-demand' model from the others. You usually put a lot of customers on the same host (either app-server or database-server). So when even one single customer is causing a CPU spike thanks to his 12 million time-sheet records he needs to approve by month-end, all other 'innocent' customers put to share same resources as him suffer. Humans (and more so, enterprises) really really hate to be punished for crime someone else (sometimes, a competitor!) had commited. Just imagine PFIZER is asked to pay taxes on MERCK's income. They won't like it, will they? Similarly, try telling a customer that he couldn't log in for the entire yesterday because your biggest customer was "hogging" the entire database server because "this guy has 400 very complicated change orders he wanted to release". Well, good luck!

Business should be extremely aware of this. In the old client-server days you could send a "team" to the irate guy's office to "track and fix" the issue. With this new age "on-demand" your equivalent options would be to send a team to all your customers OR, fix it right there before it spreads to others. Difficult, yes?

Imagine a really 'cool' kid at your kid's birthday party motivating all other kids to jump and grab all the balloons from wherever they can! I bet it won't take very long for the kids to believe in him or her and follow the leadership. They're very serious in those matters. Same thing happens when a creepy performance or stability issue manifests in the 'on-demand' scenario.

The available resource is more limited now and the competition to grab it is much more intense than ever, but the 'heating up' of the system could cause it to go 'boink'.

That, not the marketing, is 'on-demand's biggest challenge that they must surmount. Very well planned resource-scheduling, fine degree of load-balancing, intense monitoring, gigantic load-tests with the realistic production data and set-up and above all a competent performance evangelist are truly needed by the business. Remember the customers do not need IT anymore. They have entrusted the provider with their data, process and application. But when one of them goes down all go down and you can never let that happen unless you really negotiated your severance well before.

posted by Nilendu # 2:57 PM 0 comments

Thursday, January 27, 2005

Myths Contd. - UNION OR UNION ALL?

Another prevalent myth among SQL coders is "Use UNION ALL and don't really use UNION". The reason (they believe) since UNION ALL does not sort the data it's gotta be faster. Some people even go to the extent of firing up some kinda standard doing

SELECT DISTINCT (values)
from
(select values from A
UNION ALL
select values from B);

Rather than writing

select values from A
UNION
select values from B;

Just a few days back one developer actually rushed to me when he found that in one query he was debugging UNION ALL happened to be faster. He has this - "OK - finally I found a bug in Oracle kernel" look! Thankfully I was doing my daily dosage of Dilbert - and retained my sanity.

Let's just take this somewhat detailed description from a someone amazed at the fact that UNION was indeed "faster" in his case.

In this particular case, however, the difference in time can probably be attributed to the fact that data was physically partitioned in this Data Warehouse (look at the 'UNION-ALL (PARTITION) " in the plan). Also note "partition_view_enabled = true" in the parameter settings. In the faster plan, the partition was eliminated.

For those still thinking about UNIONs can be replaced by UNION ALL and vice versa, a small example -

-- UNION (duplicates removed):
select * from dual
union
select * from dual;
-----------------
X

-- UNION ALL (duplicates not removed)
select * from dual
union all
select * from dual;
------------------
X
X

Here's an example for Oracle8 where UNION ALL was even slower than the sum of execution of the component queries.

Here's an example from Metalink (needs login). Someone actually was anguished and demanded to know WHY UNION is faster than UNION ALL. Very interesting case with Oracle 8i. Anyone cares to illustrate why? I have an opinion on that.

I've done my own tests and found that upto Oracle 10g, in many different cases UNION performance can be significantly better than UNION ALL.

Let's say, that both are different. If one could have been replaced by the other better-performing one, then why would there be two operators? Language components are not like dress shoes.

posted by Nilendu # 4:52 AM 0 comments

Wednesday, January 26, 2005

Myths, Folklores and Urban Legends...

Much of the field of "Performance Tuning" or "Optimization" translates to "fire fighting" in today's terms (sadly!). As it happens when faced with such situations, different random permutations of options are tried to solve the issue. In chaos, one thing may work. Not much later that becomes a panacea. Whereas something really different, or may be something not-so-directly affected by the change may have addressed the issue, the legend is already born.

There goes the origin of myths like -

(1) You should have 20 Rollback segments
(2) Rebuild Indexes every Sunday at 10AM (lots of overtime for the consultant DBA :D)
(3) Indexes, and tables should be contained within minimum number of extents, if not one.
(4) Indexes and tables should be kept on separate tablespaces, since that kinda allow database to read tables and indexes in "parallel".
(5) EXISTS is better than IN
(6) UNION is better than OR
(7) Never OUTER JOIN!
(8) Response time is a linear function of data volume (wow!)
(9) When"hit" ratios are 99.99% or so, go home...and so on...

Believe me, I could go on and on.

Tom Kyte and some others have been doing great work in debunking the myths everyday. But they just keep coming back. Tom, for example, doesn't believe in "Rules of Thumb" (appropriately abbreviated as "ROT"). I like the fact that he actually asks "so how much work is really done" rather than, "let's eliminate all the full-table scans because you know, full-table scans are ugly". Well, like Ogres full-table scans also come in layers.

Just today I was going through some SQL best-practice guide

The problem with these best practice authors is they generalize too much. Every system is unique. You cannot just extrapolate stuff from one and apply on another. Well, you can. But then let's not claim that's a "solution". Let's say that was a "try worth making".

So in this document I see some interesting stuff. It says - "Use EXISTS / NOT EXISTS instead of IN / NOT IN". Really? This is wrong. Completely wrong.

Not only it's a lie, but sometimes they bring back different data as well.

And this example of generalizing a particular "workaround" as a "cure" was from the content page! More on that paper soon.

posted by Nilendu # 2:21 PM 0 comments

So, what got me started?

I've been working on databases for about last 8 years. The beginning was accidental. I was working with a team developing a "product" for big stock brokers in India. One of the major problems was the size and rate of the arriving data "packets" from stock exchange(s) . Every second upto 20 packets each containing upto 20 records of 20 tickers could arrive. This whole dataset was then supposed to go to database. Simple "insert"s into Oracle 7.3 database. In reality this was a big scalability issue - the product simply could not deal with that much volume. Even if it could insert efficiently, the reads will be slow and so on.

I 'designed' a shadow-table logic - something really simple that would restrict the maximum volume a table can have and thus ease the "reads". This worked. I will talk about the model in a later instalment.

There are two types of work - sometimes converging into one. One, that you do well, and thus the other people would love you to do. Two, one that you yourself love to do. I was lucky that what I did well in my first job was something I also loved doing. From then on, it's mostly been performance, database, and more of these two together!

It's been either the voodoo of putting an obscure hint like, cardinality; changing the data model significantly; 'lighten' the question asked and thereby reduce work; rewrite the question (query!); adding faster processors; adding more memory (works much less often!); liposuction of really obese code; patching; using newer features of a particular product line; doing "simple" maintenance to reap benefits and what not! But the central issue was always performance. Except when the central issue was really availability!

My forecast for next 5 years is - a performance analyst will occupy a very significant position in any software organization. Two reasons behind that - (a) absolute data volume is growing every day - every year you file just another round of taxes, right? and, (b) the number and complexity of the 'questions' we're asking have been growing enormously - "Are all modules SOX compliant?" was never asked 2 years back. These two factors, combined with almost a methodic practice of ignoring performance at design level will cause severe bottlenecks.

This whole series is mostly not about how to "solve" those. Of course, the examples will be from the solutions. But I rather want to focus on what we could have done to not have those problems at all! Hey, it's not always the database optimizer goofing up. Sometimes it's just because the next-guy 'forgot' to join one 10-million record table but put it in the WHERE clause nevertheless!

posted by Nilendu # 1:15 AM 0 comments