Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Thursday, February 24, 2005

A review of a certain import

I was asked to review one data import procedure document yesterday. The document is pretty elaborate and confidential - thus I am skipping it. But few of my suggestions should work across the board.

Scope of review import, compile objects, analze statistics in a new schema.

The original import syntax was something like -

imp userid=system/manager@SID FILE=dumpfile.dmp FROMUSER=GOLD_60 TOUSER=GOLD_60 LOG=imp_clm.log BUFFER=5242880 COMMIT=Y COMPILE=Y CONSTRAINTS=Y FEEDBACK=1000000 GRANTS=Y IGNORE=Y INDEXES=Y ROWS=Y SHOW=N STATISTICS=SAFE

Some of my comments were --

(1) Since the statistics is anyways calculated after import, why waste time on calculating it during import?

Thus for both export and import STATISTICS should be set to “none”

(2) BUFFER could be much higher provided nothing else runs during the import.

(3) " COMMIT =Y" is also a risky option.

For large tables which have millions of records, it will do a lot of commits and thus potentially slow things down.

For tables that have CLOB – commit = Y means the imp will commit after EACH ROW.

Agreed, COMMIT = N runs the risk of running out of rollback segment (it commits after inserting the whole table) – but, may be, for cases where speed is an issue, this could be set to N (only after making sure enough rollback space is available).

(4) Same for COMPILE =Y – they’re compiled later anyways, why compiling them during import?

The analyze command to collect statistics was like -

begin
dbms_stats.gather_schema_stats (ownname => ‘User’,
cascade => TRUE, method_opt => ‘FOR ALL COLUMNS SIZE 1’);
end;
/

(5) When we specify “method_opt => ‘FOR ALL COLUMNS SIZE 1’, Oracle calculates ONLY ONE histogram bucket. Statistically, a total of 75 histogram buckets fits most data skewness. That's the default number of buckets for Oracle.

That’s not very useful for columns that have skewed data. We should use 'SIZE AUTO' – where Oracle determines the columns to collect histograms based on data skewness automatically.

(6) Also, to use the new 9i CPU based “cost model” (search for "New Cost Model ") we should start collecting the “system statistics” (the CPU speed, DISK I/O speed etc. that oracle collects from the system and stores it in database to help itself optimize cost of query). The call to do so could be --

-- make sure there is at least one job_queue_process
begin
dbms_stats.gather_system_stats (gathering_mode => 'INTERVAL',interval => 360);
-- this will collect system statistics for 6 hours
end;
/

I’ve seen the cost model changes according to the particular hardware and storage, and I would definitely recommend at least trying to gather system statistics to let Optimizer use the new 9i CPU based COST model, rather than the old “I/O” based cost model.

The export clauses were not under the scope of review. However, I noted that "compress = y" could be the biggest offender in the export clause. It should always be set to "compress = n - to avoid tinkering with extent management (solution to a problem that never existed!).

posted by Nilendu # 5:57 PM

Comments: Post a Comment

<< Home

Oracle Performance and Engineering

Thursday, February 24, 2005

A review of a certain import

Archives