Oracle Performance and Engineering

Random notes on database features, code, practice, optimization and techniques.

Monday, March 07, 2005

 

SMON eating up the CPU - A Real-life case study

Last year, I was diagnosing CPU spike in a certain site. Strangely enough, analyzing the issue led to Oracle's SMON process being responsible for occupying a full CPU cycle for hours, and with no intention of slowing down or stopping.

SMON does the following work -

(1) Temporary Segment Cleanup
(2) Free Space Coalescing (only for dictionary-managed tablespaces)
(3) Instance recovery (if needed) at the instance startup

Whenever things like "it was working fine till yesterday" happen - I try to collect a list of "what all were changed since yesterday".

It was found that the application was creating lots of temporary objects - a *lot* on session-level. The objects were created with a public synonym. But, here was the deal, when the objects were dropped -- the public synonyms were NOT dropped. Thus they remained in the system. The DBA found an exorbitant number of rows in DBA_OBJECTS (thanks to millions of un-dropped public synonyms accumulated over years!) and in the night ran a batch job to drop them. Next morning, they come in and boom! SMON is showing to be the CPU hogger with no sign to retire.

The DBA_OBJECTS is a view on OBJ$ - which resides in SYSTEM tablespace. On this particular installation SYSTEM tablespace was dictionary managed (a legacy from 8i). So suddenly the OBJ$ table is down from 4million records to just over a few thousands - and SMON is trying really hard to clean up the "holes" in dictionary-managed SYSTEM tablespace. OBJ$ table is actually part of a cluster - thus making the process even slower.

The proof of this theory were -

(a) V$SQL showing the query which directs to obj$ table being accessed an inordinate number of times
(b) The result from "select count(*) from sys.obj$" query is dropping with time.

The SMON cleanup cost 12.5% of the horsepower in that production environment. The server had 8CPUs and SMON was clogging 1 of them for too long.

We figured out that event '10052 trace name context forever'; stops SMON from coalescing free space in a dictionary-managed tablespace. We had to put that on init.ora and restart the database to make sure SMON does not go power hungry again.

On the weekend, during very low-usage - we would again restart the database and remove the parameter to unleash the evil force of SMON ;)

Lesson? Yeah. We've dropped objects from a production database and faced problems. Thus, even if not directly linked with other functionality - changes in a working environment could have painful effects. As the adage goes, if it ain't broke ....

C.R. talk -

An awesome example of problem solving - the right way (analysis, defining, solve, test, refine..in a very out-of-the-box way) and the wrong way (trying iterations based on 'past examples' that may have worked in a different environment, painful, expensive, without any environment analysis).

Comments: Post a Comment



<< Home

Archives

January 2005   February 2005   March 2005   April 2005   May 2005   March 2006   April 2006   August 2006   September 2006   October 2006   January 2007   May 2008  

This page is powered by Blogger. Isn't yours?