CPU usage raised to 100% because of dbresp.pl
Posted by Kamran Agayev A. on January 11th, 2011
Today I’ve got a call from my friend which claimed the performance degredation on one of the production databases. When connecting to SQL*Plus or RMAN, I realized a delay, so run “top” command and checked the running processes on the system. When running ps – ef command, I saw hundreds of perl executables that are currently running on the system:
[sourcecode]oracle 15560 1 3 Jan11 ? 05:50:07 /opt/oracle/product/10.2/db_1/perl/bin/perl /opt/oracle/product/10.2/db_1/sysman/admin/scripts/db/dbresp.pl
oracle 16309 1 3 Jan11 ? 05:44:53 /opt/oracle/product/10.2/db_1/perl/bin/perl /opt/oracle/product/10.2/db_1/sysman/admin/scripts/db/dbresp.pl
…..
…..[/sourcecode]
As the dbresp.pl file locates under sysman folder, I’ve decided that it has some relation with EM, so I checked the EM trace file:
[sourcecode]tail -50 emagent.trc | more
2011-01-11 08:51:37 Thread-4096777120 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds
2011-01-11 08:51:37 Thread-4096777120 ERROR command: failed to kill process 24963 running perl: (errno=3: No such process)
2011-01-11 08:51:37 Thread-4096777120 ERROR engine: [oracle_database,prod_db,Response] : nmeegd_GetMetricData failed : Metric execution timed out in 600 seconds
2011-01-11 09:06:37 Thread-4113513376 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds
2011-01-11 09:06:37 Thread-4113513376 ERROR command: failed to kill process 25393 running perl: (errno=3: No such process)
2011-01-11 09:06:37 Thread-4113513376 ERROR engine: [oracle_database,prod_db,Response] : nmeegd_GetMetricData failed : Metric execution timed out in 600 seconds
2011-01-11 09:21:37 Thread-4096777120 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds
2011-01-11 09:21:37 Thread-4096777120 ERROR command: failed to kill process 26068 running perl: (errno=3: No such process)
2011-01-11 09:21:37 Thread-4096777120 ERROR engine: [oracle_database,prod_db,Response] : nmeegd_GetMetricData failed : Metric execution timed out in 600 seconds
2011-01-11 09:36:37 Thread-4099926944 ERROR fetchlets.oslinetok: Metric execution timed out in 600 seconds[/sourcecode]
Wouu… Interesting output. I’ve decided to check metalink and found the following note: Server Has 100% Of Cpu Because Of Dbresp.pl [ID 764140.1]
Unfortunately as a solution the note adviced me to refer to the metalink note: “ Ext/Mod Problem Performance Agent High CPU Consumption Gen” where it’s written to change the alert.log file name to solve the issue. It wasn’t a real solution, so I’ve decided to take down the EM and kill all processes
[sourcecode]emctl stop dbconsole[/sourcecode]
Then I called the following command and got the list of all dbresp.pl processes and got the script which kills them all
[sourcecode]ps -ef | grep dbresp.pl | awk {‘print "kill -9 " $2’} > kill.sh
more kill.sh
kill -9 23989
kill -9 24569
kill -9 25145
kill -9 25723
…..
…..[/sourcecode]
Next, I made it executable and run :
[sourcecode]oracle@host</a>:~> chmod 755 kill.sh
oracle@host:~> ./kill.sh
oracle@host:~>
oracle@host:~> ps -ef | grep dbresp
oracle 32454 29520 0 10:48 pts/0 00:00:00 grep dbresp [/sourcecode]
After killing all unnecessary processes, CPU usage went down.
To deal with this bug, you can check the count of dbresp.pl files, take down the EM, kill all processes and start it again using any cron job
If you have another solution, please let me know
January 11th, 2011 at 1:07 pm
Very helpful post thank you!
January 11th, 2011 at 4:27 pm
Hi Kamran,
I faced the same problem few months ago , in one 10gR2 database , I dit the same ; I killed all EM processes after shutdowm EM.
This is the only solution I found,
Cheers,
Wissem
January 13th, 2011 at 1:54 am
I, too, had this issue in a 10gR2 db. Found the same exact metalink document and did exactly what you did Kamran. But thanks for reminding me of that day! 😉
January 13th, 2011 at 3:53 pm
I’m just curious what kill.sh does as you can use ps -ef | grep dbresp.pl | awk {‘print “kill -9 ” $2’} |bash (or other shell) to do the same job 😉
January 13th, 2011 at 3:56 pm
Haha, you’re right Ivan THat’s what being a Linux guru 😉
January 14th, 2011 at 11:48 am
Thanks!
March 1st, 2011 at 7:43 pm
It didn’t work 4 me on windows, I Stoped the emctl and killed all the em process it is the same oracle using 100% cpu and i created a new alert.log no use still the same
October 24th, 2012 at 2:06 pm
For us the 100% CPU from dbresp.pl was caused by the TNS Listener hanging. See: http://arjudba.blogspot.ie/2009/01/listener-hangs-child-listener-process.html
To resolve: kill process (dbresp.pl), stop listener, start listener.
October 29th, 2012 at 3:48 pm
[…] CPU usage raised to 100% because of dbresp.pl Oracle – Utilisation 100% CPU sous linux par Perl Cette entrée a été publiée dans Uncategorized. Vous pouvez la mettre en favoris avec ce permalien. ← Marque-Page – Réseau et Télécom […]