Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Wednesday, August 23, 2006

 

One thing I learnt today (or, something I knew was wrong!)

A few months back, I was investigating a slow query for a client that had a huge IN-list (over 1100 items in IN-list). The IN-listed column was a unique key and yet the query was not using an index (complicated query fired on a UNION ALL-ed view -- each part of the UNION ALL - separately fired - would use the index however). I was trying to play with statistics and asked for generating statistics with histogram across the schema. Or, asked to fire the following -



exec dbms_stats.gather_schema_stats(ownname => 'my_owner',method => 'FOR ALL COLUMNS SIZE AUTO', cascade => TRUE);



I *thought* that the "AUTO" will decide which columns to collect histograms on. Though the strategy had solved the problem and confirmed my wrong assumption - today I learnt that "AUTO" option will only work - if at all - on a database that's being used in full production capacity for a while; or in other words, has all typical SQL statements fired by the application in the cache.

Given AUTO option, Oracle actually queries SYS.COL_USAGE$ table (that stores details of columns used in WHERE clause) and determines what columns should it collect histograms on. This column usage monitoring is apparently *not* governed by "alter table monitoring on;" (as given wrongly in some documents - including this one on Oracle website). SYS.COL_USAGE$ population depends on an undocumented parameter _COLUMN_TRACKING_LEVEL.

Thus, to collect histograms on the columns with most skewed data - sepcially after upgrade, db restart, migration etc. - we should really use "SKEWONLY" option. While this is time consuming, this one looks like the best option. For subsequent execution - we can always use "REPEAT" (thereby not repeating same calculation to see which columns actually have skewed enough value to warrant a histogram).

I also learnt about a new Oracle 10g-R2 (10.2) optimizer parameter called optimizer_secure_view_merging

Wasn't this achieved with MERGE / NO_MERGE and PUSH_PRED / NO_PUSH_PRED hints? Have to do more testing on this parameter.

Monday, August 21, 2006

 

Why correct datatype is SO important in schema design!

We often, randomly, use a VARCHAR2 for a field that should really be a NUMBER (or DATE). This could have drastic impact on product performance as followed in this case study.

The following SQL was found to about 75% of the total load (logical I/O) from AWRStats report on a 10.1.0.4 database.


SELECT * FROM V WHERE report_type_id = :1;


On querying, I found the view looks like -


CREATE OR REPLACE VIEW V as
SELECT ROWNUM AS MIGRATOR_KEY
, PARENT_RT.report_type_id AS PARENT_MIGRATOR_KEY
, 'RT_SG_REVERSED' AS RESTRICTION_TYPE
, 'Reversed image of RT_SG restrictions' AS RESTRICTION_TYPE_NAME
, PARENT_RT.report_type_id
, PARENT_RT.report_type_name
, E.entity_name
, CHILD_SG.security_group_id
, CHILD_SG.security_group_name
FROM KNTA_Entities E
, KNTA_REPORT_TYPES PARENT_RT
, KNTA_SECURITY_GROUPS CHILD_SG
WHERE CHILD_SG.enabled_flag = 'Y'
AND CHILD_SG.security_group_id NOT IN (
SELECT ER.child_value
FROM KNTA_ENTITY_RESTRICTIONS ER
WHERE ER.restriction_type = 'RT_SG'
AND ER.parent_value = TO_CHAR( PARENT_RT.report_type_id )
)
AND PARENT_RT.report_view_access_code <> 'ALL'
AND E.entity_id = 15 -- security group entity id;



There are (to me) two things wrong with the highlighted part.
(a) It refers to an OUTER table within a NOT IN block.
(b) It compares VARCHAR2 with TO_CHAR(NUMBER)

So, I rewrote the view with a more fitting NOT EXISTS -


CREATE OR REPLACE VIEW v1 AS
SELECT ROWNUM AS MIGRATOR_KEY
, PARENT_RT.report_type_id AS PARENT_MIGRATOR_KEY
, 'RT_SG_REVERSED' AS RESTRICTION_TYPE
, 'Reversed image of RT_SG restrictions' AS RESTRICTION_TYPE_NAME
, PARENT_RT.report_type_id
, PARENT_RT.report_type_name
, E.entity_name
, CHILD_SG.security_group_id
, CHILD_SG.security_group_name
FROM KNTA_ENTITIES E
, KNTA_REPORT_TYPES PARENT_RT
, KNTA_SECURITY_GROUPS CHILD_SG
WHERE CHILD_SG.enabled_flag = 'Y'
AND NOT EXISTS (
SELECT ER.child_value
FROM KNTA_ENTITY_RESTRICTIONS ER
WHERE ER.restriction_type = 'RT_SG'
AND ER.parent_value = PARENT_RT.report_type_id
AND er.child_value = child_sg.security_group_id)
AND PARENT_RT.report_view_access_code <> 'ALL'
AND E.entity_id = 15 -- security group entity id;


And, changed the data types of PARENT_VALUE and CHILD_VALUE from VARCHAR2(80) to NUMBER (easy with 10g and all -- create a new field with desired datatype; update it to fill with values; drop the old column; rename the new column; rebuild the indexes).

The plan changed from (25000 LIO, 2 sec+)


Operation OBJECT Name ROWS Bytes Cost OBJECT Node IN/OUT PStart PStop

SELECT STATEMENT Optimizer MODE=ALL_ROWS 14 K 45912.0421527342
VIEW MITG.KNTA_RESTRICT_RT_SG_MIGE_V 14 K 4 M 45912.0421527342
COUNT
FILTER
NESTED LOOPS 14 K 1 M 332.450479722286
NESTED LOOPS 113 7 K 6.02673530593326
TABLE ACCESS BY INDEX ROWID MITG.KNTA_ENTITIES 1 20 1.00224632611494
INDEX UNIQUE SCAN MITG.KNTA_ENTITIES_U1 1 .00111934092369478
TABLE ACCESS FULL MITG.KNTA_REPORT_TYPES 113 4 K 5.02448897981832
TABLE ACCESS FULL MITG.KNTA_SECURITY_GROUPS 132 3 K 2.88870570279958
TABLE ACCESS FULL MITG.KNTA_ENTITY_RESTRICTIONS 1 34 5.11315481360784



to (doing just about 50LIO -- returning within a fraction of second)


Operation OBJECT Name ROWS Bytes Cost OBJECT Node IN/OUT PStart PStop

SELECT STATEMENT Optimizer MODE=ALL_ROWS 1 338.287484705643
VIEW MITG.V1 1 297 338.287484705643
COUNT
HASH JOIN RIGHT ANTI 1 127 338.287484705643
TABLE ACCESS FULL MITG.KNTA_ENTITY_RESTRICTIONS 224 7 K 5.10449694166571
MERGE JOIN CARTESIAN 14 K 1 M 332.450479722286
NESTED LOOPS 113 7 K 6.02673530593326
TABLE ACCESS BY INDEX ROWID MITG.KNTA_ENTITIES 1 20 1.00224632611494
INDEX UNIQUE SCAN MITG.KNTA_ENTITIES_U1 1 .00111934092369478
TABLE ACCESS FULL MITG.KNTA_REPORT_TYPES 113 4 K 5.02448897981832
BUFFER SORT 132 3 K 327.425990742468
TABLE ACCESS FULL MITG.KNTA_SECURITY_GROUPS 132 3 K 2.88870570279958


Friday, August 18, 2006

 

Yet another bug discovered - MATERIALIZE hint bringing wrong results (9i databases)


WITH temp AS
(
SELECT /*+ MATERIALIZE */
rsc_pool_id
FROM krsc_rsc_pools p
WHERE linked_entity_id = 313
AND linked_entity_primary_key IN (
SELECT DISTINCT org_unit_id
FROM krsc_org_units o
START WITH org_unit_id IN (
SELECT linked_entity_primary_key
FROM krsc_rsc_pools
WHERE linked_entity_id = 313
AND rsc_pool_id IN (30014))
CONNECT BY PRIOR org_unit_id = parent_org_unit_id))
SELECT *
FROM temp;


This query brings totally different results between (a) when it's using MATERIALIZE hint and (b) when it's not (i.e., by default using the INLINE mechanism).

On reading the trace file - I found Oracle is totally re-writing the query and dropped a few filters. This is how the temp table is created (materialized) -


INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */INTO sys_temp_0fd9d6607_e098df80
SELECT rsc_pool_id
FROM krsc_rsc_pools p
WHERE linked_entity_id = 313
AND linked_entity_primary_key =
ANY (SELECT DISTINCT org_unit_id
FROM krsc_org_units o
CONNECT BY parent_org_unit_id = PRIOR o.org_unit_id
START WITH org_unit_id =
ANY (SELECT linked_entity_primary_key
FROM krsc_rsc_pools))


Notice the filters “WHERE linked_entity_id=313 AND rsc_pool_id IN (30014)” are totally gone!!

Advised by Oracle support, I tried the following three tweaks, none worked -



alter session set "_pred_move_around"=false ;
alter session set "_complex_view_merging"=false ;
alter session set "_unnest_subquery"=false ;



The issue does not reproduce in 10g databases. Finally, Oracle opened a bug - #5421368 on this.

 

Severe Performance issues using TABLESPACE GROUPS w/ multiple TEMP TABLESPACES

In short, please refrain from using “Tablespace Groups” feature on Oracle 10g as there’re some very expensive, ill-optimized recursive checks that could put enormous load on the system – particularly during sorting (or other set- ) operations.

Issue

All "portlets" (snapshot dashboard reports that do sorting) taking (initially) 30~60 sec on first execution. On immediate second execution the time comes down to low single digit figure.

Resolution Details

We resolved the issue in two-phase improvements. Right now, most of the portlets come back within 3~5 seconds on any given execution.

We also expect the same resolution to effect *all* 10.1.0.3 customers to give a much better portlet performance and bring down the total database load on the client DB by around 20%.

Phase One – Data Skewness: Response time improved from 50-60 sec to about 20-30 sec.

We found that more than 95% of the requests belong to just one request type. In other words, requests are not evenly distributed across the request types. We deleted the column statistics on REQUEST_TYPE_ID to trick Oracle into believing that requests are evenly distributed – and thus using the index on request_type_id. This gave us relatively smaller, but very visible improvement of up to 30%-50% on many portlets. Still some of the portlets were taking up to 20 sec!

Phase Two – Oracle Bug /Configuration Issue: Response time improved from 20 sec to 0.2 sec

Observation One - We found that every time there’s a “sort” operation (ORDER BY) in a portlet – the first execution takes up to 20 sec. Immediate next execution comes down to <>


select min(bitmapped)
from
ts$ where dflmaxext =:1 and bitand(flags,1024) =1024

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 261649 10.06 9.72 0 0 0 0
Execute 261649 8.24 7.53 0 0 0 0
Fetch 523298 68.57 68.96 0 18577079 0 261649
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1046596 86.87 86.22 0 18577079 0 261649

Observation Three – We also found that the said Oracle user uses a “Temporary Tablespace Group” (a new feature in Oracle10g) rather than a single temporary tablespace.

A temporary tablespace group consists of several temporary tablespace and the recursive query above tries to determine which of these tablespaces is best suited for a particular sort-ing (ORDER BY – present in all portlets!) operation. For some reason (probably an Oracle bug) this recursive SQL is ill-optimized and consumes a lot of resource.

Thus, we changed the user from using a “Temporary Tablespace Group” to a “Temporary Tablespace” – so that the above check does not have to occur.

After making the change, we experienced a dramatic improvement.

Here's more on temporary tablespace groups.


Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?