Kamran Agayev's Oracle Blog

Oracle Certified Master

Archive for the 'Administration' Category

Step by Step Mastering Oracle Database Cloud Service – DBaaS – in one pdf now!

Posted by Kamran Agayev A. on 16th February 2017

Hello guys

After posting a few articles on DBaaS, I’ve decided to create a single pdf file and collect all my cloud related step by step practical blog posts in it. Click on the following image to download the pdf file and become Oracle DBaaS master! :) I will keep updating the pdf and hope it will be much easier fo you to get all articles in one shot!

 

Oracle DBaaS

Posted in Administration, Oracle Cloud | 16 Comments »

RMAN “Catalog start with” returns “no files found to be unknown to the database”

Posted by Kamran Agayev A. on 2nd January 2016

Interesting issue happened today (running 11.2.0.3 DB on Linux) while using “CATALOG START WITH” command in RMAN. Although I was having all backups under /orabackup folder, RMAN didn’t find anything, reporting “no files found to be unknown to the database” message:

RMAN> catalog start with ‘/orabackup’;

searching for all files that match the pattern /orabackup
no files found to be unknown to the database

RMAN>

 

I changed the permission to “chmod 777” and checked the owner, used different patterns like “/orabackup/*”, “/orabackup/” and  “/orabackup/*.bkp” but failed. Then I decided to create a folder under the /orabackup folder, moved all files in it and tried “CATALOG START WITH” command again, and succeeded.

RMAN> catalog start with ‘/orabackup’;

searching for all files that match the pattern /orabackup
no files found to be unknown to the database

RMAN>

[oracle@db1 ~]$ cd /orabackup/
[oracle@db1 orabackup]$ mkdir test
[oracle@db1 orabackup]$ mv *.bkp test/
RMAN> catalog start with ‘/orabackup/test’;

searching for all files that match the pattern /orabackup/test

List of Files Unknown to the Database
=====================================
File Name: /orabackup/test/mydb_database_btqqatr2_1_1.bkp
File Name: /orabackup/test/mydb_database_bsqqakoq_1_1.bkp
File Name: /orabackup/test/mydb_arch_buqqatrp_1_1.bkp
File Name: /orabackup/test/mydb_contolfile_bvqqatsc_1_1.bkp
File Name: /orabackup/test/mydb_spfile_c0qqatsk_1_1.bkp

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files…
cataloging done

List of Cataloged Files
=======================
File Name: /orabackup/test/mydb_database_btqqatr2_1_1.bkp
File Name: /orabackup/test/mydb_database_bsqqakoq_1_1.bkp
File Name: /orabackup/test/mydb_arch_buqqatrp_1_1.bkp
File Name: /orabackup/test/mydb_contolfile_bvqqatsc_1_1.bkp
File Name: /orabackup/test/mydb_spfile_c0qqatsk_1_1.bkp

RMAN>

 

 

Posted in Administration, RMAN Backup and Recovery | 11 Comments »

Interim patch apply best practices in Oracle

Posted by Kamran Agayev A. on 2nd October 2015

Yestarday, after successfully applying an interim patch to the 3 node clusterware environment, I decided to share my experience with you. In this blog post, you can find some best practicec that I think must be followed before and during patch insallation to bring the downtime and failure risks to the minimum.

First of all, make sure you read the following metalink notes:

Master Note For OPatch (Doc ID 293369.1)
FAQ: OPatch/Patch Questions/Issues for Oracle Clusterware (Grid Infrastructure or CRS) and RAC Environments (Doc ID 1339140.1) 

 

Before applying any interim patches or upgrading the database or the clusterware, make sure you have answers to the following questions:

– Have you tested the patch installation? 

– Have you tested rollback of the patch? 

– What you will do if you can’t rollback the patch with default rollback mechanism? 

– What you will do if you fail to open the database after the patch installation? 

Do you have a backup? Have you tested it? What if you don’t have enough time to restore? 

 

Here is the list of what I would prefer to do before applying any interim patches to the mission critical environment:

– Backup the home folder that is going to be patched

tar -cvf grid_home_before_patch.tar /home/oracle/app/11.2.0

If the patch installation goes wrong and you can’t rollback the patch using default method, restore the backup of the installation home folder and bring the database (clusterware) up.

– Make sure you have a full backup of the database 

Most probably you will not go with the restoration, but you never know what might happen)

– Make sure your backup is recoverable

Yes. This might be a discussion topic, but I strongly believe (as an author of the RMAN Backup and Recovery book :) ) in “If you don’t test your backup, you don’t have a backup” philosophy. Restore it and make sure a) The backup is restorable/recoverable b) Your restore/recover scripts works fine. In my experience, I had a situation where the restore of the backup failed while restoring it for the developers for the testing purpose because of non tested recovery scripts. I heard a situation (a couple of years ago when I attended a wonderful OOW session) where one of the attendees complained how they failed to restore a backup when the production environment failed and this downtime (for days) costed them for a couple of million dollars.

– Make sure you have a Standby database.

Why? Imagine you took 30 minutes downtime to apply the patch and for any reason you were not able to do it and can’t rollback because you are in the middle of the patch apply procedure and trying to fix the issue. Or you can’t rollback the patch for any reason. You stuck! And you don’t have time to solve it. And you are forced to open the database right away. And you can’t do it as well. In this critical case, you can forward the applications to the Standby database. Build up a standby database, make sure archived log files of the production database are shipping to the standby server. You can also perform a failover to test your standby database and build it up again.

– Test the patch apply procedure on the test environment with the “same binaries”. 

Clone the database and clusterware soft to the test machine (or install the same release and apply the same patches as in the production environment) and apply the patch. Get the errors in the test environment before you get them in the production.

– Make sure there’s no any session runinng in the background related with the binaries of the home that is being patched. 

Yesterday, when I was trying to apply an interim patch to the 3 node clusterware (11.2.4) I came up with the following error:

Backing up files…

UtilSession failed: Map failed
java.lang.OutOfMemoryError: Map failed
Log file location: /home/oracle/11.2.0/grid_1124/cfgtoollogs/opatch/opatch2015-10-01_17-40-56PM_1.log

OPatch failed with error code 73

The reason was not the memory at all, we were having a lot of free memory at that time. There were some binary files in use in the background despite the fact that the whole clusterware stack was down. Anyway, I killed all processes and the installation proceeded.

The following metalink note also might be useful – OPatch Apply/Rollback Errors: ‘Prerequisite check “CheckActiveFilesAndExecutables” failed’ (Doc ID 747049.1)

– Download the latest OPatch

Check the following metalink note to download the latest OPatch. How To Download And Install The Latest OPatch Version (Doc ID 274526.1)

Download and extract it under the home folder that is going to be patched. Add the $GRID_HOME/OPatch or $ORACLE_HOME/OPatch to the PATH environment variable. Make sure which opatch command returns you a result

– Make sure you have enough free space in the mount point where the home folder reside.

A few years ago when I was trying to install a patch to the production environment, I decided to try it on the test environment (10 minutes before patching a production database). I ended up with the “There is no free space to proceed the installation” error. Home folder was full, and OPatch was taking backup of the binaries and library files that are being patched.  Check the following metalink note for more informaton: Opatch Fails Updating Archives with ” No space left on device ” Error. (Doc ID 1629444.1)

 – Bring the instance down before patching

If you have a Grid Infrastructure installed, you have a RAC database and you plan to apply the patch node by node without a downtime, bring the instance of the node you’re patching using the following command:

srvctl stop instance -d RACDB -i RACDB1

Why? Because if you start installing the patch and run the rootcrs.pl -unlock command which is the first step that brings the clusterware stack down, the database will be closed with ABORT mode and non of the sessions will be failed over.

– Try to rollback the patch installation at test environment after installing it

Why? Feel how you should rollback (and see if you get any error) the specific patch if you failed the installation and can’t proceed, or you installed successfully, but it caused another bug or problem. Check the following metalink note to learn how to rollback the patch and run opatch lsinventory to make sure it is rollbacked.  How to Rollback a Failed Interim Patch Installation (Doc ID 312767.1)

Sometimes, rollback also might fail :) In this case, the best option is to restore the whole home folder from the backup, but it is not mentioned in ths metalink note OPatch Fails to Rollback Patch Due to Relink Errors (Doc ID 1534583.1)

– Debug the OPatch if it is stuck 

You can use OPATCH_DEBUG=TRUE parameter to debug the OPatch. If it doesn’t generate enough information, use truss (or strace in Liunx) to debug OPatch. Check the following metalink note to learn how to use truss with OPatch. How To Use Truss With OPatch? (Doc ID 470225.1) 

Opatch might also hang due to corrupted jar and java executables. Check this metalink note – opatch napply Hanging (Doc ID 1055397.1)

 

This is all I have :) Please let me know if this document helped you and share your experience with me :) Have a successfull patching days ahead! :)

Posted in Administration, RAC issues | 8 Comments »

ORA-02298: cannot enable name – parent keys not found

Posted by Kamran Agayev A. on 28th August 2015

An interesting case we just got while migrating a data to the different database where we got the following error:

 

ORA-39083: Object type REF_CONSTRAINT failed to create with error:

ORA-02298: cannot enable name – parent keys not found

 

The issue was between 2 tables. It couldn’t enable the foreign key constraint by throwing error that “parent key not found”. The migration was from production database where the foreign key constraint was enabled and it is not possible to have a data in the child table without parent rows.

After quering both tables I found the missing rows, deleted them from the child table and run the query to enable the constraint again – and it succeeded. The reason was because the first table that was exported was the primary table. Then a few rows were added to both primary and child tables (in the production database). Then child table was exported with having those rows which are missing from the primary table (because it was exported firstly)

The solution was to use FLASHBACK_SCN or FLASHBACK_TIME parameter while exporing the data and get all rows that are consistent as of this SCN or TIME

Posted in Administration | No Comments »

ORA-00304: requested INSTANCE_NUMBER is busy

Posted by Kamran Agayev A. on 27th August 2015

There are a lot of explanation and different solutions for the error “ORA-00304: requested INSTANCE_NUMBER is busy”. But today, in my case while I was tyring to shutdown one of the cluster nodes, it hanged. There were no more information related with the hang in the log and trace files, so I went with shut abort and startup and got the following message:

SQL> startup

ORA-00304: requested INSTANCE_NUMBER is busy

SQL>

The second node of the RAC database was up and running. And the instance_number was set to 2. After a little investigation, I found out that there was one process related with the database running on OS (even the database was closed) I killed that session and started the first node and it opened successfully

Posted in Administration, RAC issues | 3 Comments »

Getting ORA-01105 during RAC db startup

Posted by Kamran Agayev A. on 30th July 2014

Today, while starting RAC instances of 2 node RAC database (10gR2 on Linux), I got the following error in the first node:

ORA-01105: mount is incompatible with mounts by other instances
ORA-01677: standby file name convert parameters differ from other instance

 

I checked the alert.log file, but there was no enough information to solve this issue:

Wed Jul 30 09:58:48 AZST 2014
Setting recovery target incarnation to 2
ORA-1105 signalled during: ALTER DATABASE MOUNT…
Wed Jul 30 09:58:58 AZST 2014
SUCCESS: diskgroup DATA was dismounted

 

After playing with some initialization parameters, I found a metalink note where it was defined as a bug (bug13001004)

Check out the following metalink note:

Spfile defined in OCR is not used if one exists in $ORACLE_HOME/dbs (Doc ID 1373622.1)

 

The solution is – to move parameter file to the centralized directory (/ocfs) and remove any instance_name parameter

Posted in Administration, RAC issues | No Comments »

www.OracleVideoTutorials.com

Posted by Kamran Agayev A. on 13th June 2014

Just came across a wonderful blog where you can find dozens of different Oracle related video tutorials

www.OracleVideoTutorials.com

oraclevideotutorials

Posted in Administration | 2 Comments »

Cluster won’t start if diagnostic_dest folder is missing

Posted by Kamran Agayev A. on 3rd March 2014

One of the reason of why cluster won’t start is DIAGNOSTIC_DEST folder is missing. Here it is what I got today in of the nodes of the RAC environment:

db-bash: crs_stat -t

HA Resource Target State
———– —— —–
error connecting to CRSD at [(ADDRESS=(PROTOCOL=IPC)(KEY=ora_crsqs))] clsccon 184

 

While checking alert log file of the clusterware ($GRID_HOME/log/node1/alertnode1.log

[/home/oracle/11.2.0/grid_1124/bin/oraagent.bin(4745)]CRS-5011:Check of resource “+ASM” failed: details at “(:CLSN00006:)” in “/home/oracle/11.2.0/grid_1124/log/node01/agent/ohasd/oraagent_oracle/oraagent_oracle.log”

 

ASM instance failed to start. I connected to ASM instance and tried to start it manually:

db-bash-$ asm

SQL*Plus: Release 11.2.0.4.0 Production on Mon Mar 3 10:32:24 2014

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

ASM> startup

ORA-48108: invalid value given for the diagnostic_dest init.ora parameter

ORA-48140: the specified ADR Base directory does not exist [/home/oracle/11.2.0/dbhome]

ORA-48187: specified directory does not exist

HPUX-ia64 Error: 2: No such file or directory

Additional information: 1

ASM>

 

ADR Base Directory is missing. After creating it, I successfully started the CRS and got the happiest message :) :

db-bash-$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
db-bash-$

Posted in Administration, RAC issues | 2 Comments »

How to configure Enterprise Manager 11gR2 for RAC environment

Posted by Kamran Agayev A. on 2nd March 2014

After migrating 10gR2 single instance to 11gR2 RAC db, the next step should be configuring Enterprise Manager (if you don’t use centralized Grid Control)

 

It can be done by a simple command. But as usual (at least for me) it didn’t go smooth and I got some errors :). In this article I will show you solutions for some of them.

The command should be run is as follows:

emca -config dbcontrol db -repos create –cluster

 

It will ask you to provide the following information:

db-bash-$ emca -config dbcontrol db -repos create -cluster

 

STARTED EMCA at Mar 2, 2014 3:03:19 PM

EM Configuration Assistant, Version 11.2.0.3.0 Production

Copyright (c) 2003, 2011, Oracle.  All rights reserved.

 

Enter the following information:

Database unique name:

Service name:

Listener port number:

Listener ORACLE_HOME

Password for SYS user:

Password for DBSNMP user:

Password for SYSMAN user:

Cluster name:

Email address for notifications (optional):

Outgoing Mail (SMTP) server for notifications (optional):

ASM ORACLE_HOME :

ASM port :

ASM username [ ASMSNMP ]:

ASM user password:

 

–          Get Database unique name from db_unique_name parameter from the database

SQL> show parameter db_unique_name

–          To provide a service name, get the SCAN listener name from the host (ps –ef | grep tns), get the status of SCAN listener by running lsnrctl status LISTENER_SCAN1 command from $GRID_HOME and get the service name from the output

Service “RACDB.MYCOMP.COM” has 3 instance(s).
     Instance “RACDB1”, status READY, has 1 handler(s) for this service…
     Instance “RACDB2”, status READY, has 1 handler(s) for this service…
     Instance “RACDB3”, status READY, has 1 handler(s) for this service…

 

–          Make sure all password files at all nodes – orapwNODE1[2-3] – are same. If not, create a password file at each node and provide the same password

–          As a Cluster Name, provide the name of the cluster. To get the cluster name, use the following command:

cd $GRID_HOME/bin

./cemutlo –n

–          Make sure you’ve ASMSNMP user created at ASM instance and SYSDBA privilege is granted. Do it in the first node, then copy ora+ASM1 to the different nodes under $GRID_HOME/dbs/+ASM[1-2]

 

Hope above mentioned notes will help you to create EM db control successfully for your RAC environment.

 

To get more information from metalink, use the following MOS:

How to manage DB Control 11.x for RAC Database with emca (Doc ID 578011.1)

 

Posted in Administration | 2 Comments »

V$ASM_DISKGROUP displays information from the header of ASM disks

Posted by Kamran Agayev A. on 17th January 2014

While playing with OCR recovery, suddenly I realized that V$ASM_DISKGROUP view gets information from the headers of the ASM disk files that are specified at *.ASM_DISKSTRING parameter. Here’s the description of V$ASM_DISKGROUP view from documentation:

V$ASM_DISKGROUP displays one row for every ASM disk group discovered by the ASM instance on the node.

http://docs.oracle.com/cd/E18283_01/server.112/e17110/dynviews_1027.htm

 

I got explain plan of V$ASM_DISKGROUP to know which X$ table stand behind it and got – X$KFGRP

SQL> set autotrace on

SQL> select count(1) from v$asm_diskgroup;

COUNT(1)

———-

3

 

pic1

 

 

 

 

 

 

 

SQL> select name_kfgrp from x$kfgrp;

NAME_KFGRP

——————————

 

DATA

FLASH

OCR

 

SQL> select GRPNUM_KFDSK, NUMBER_KFDSK, STATE_KFDSK, ASMNAME_KFDSK, PATH_KFDSK from x$KFDSK;

pic2

 

 

 

 

 

 

 

 

Then I queried ASM_DISKGROUPS parameter :

 

SQL> show parameter disk

 

NAME                                 TYPE        VALUE

———————————— ———– ——————————

asm_diskgroups                       string      OCR, DATA, FLASH

asm_diskstring                       string      /dev/oracleasm/disks

 

No I will create a new tablespace under FLASH diskgroup, create a new table, change owner of the disk of FLASH diskgroup and make it #*disappear* from V$ASM_DISKGROUP view, and then return everything back 

SQL> create tablespace new_tbs datafile ‘+FLASH’;

Tablespace created.

SQL> create table new_table (id number) tablespace new_tbs;

Table created.

SQL> insert into new_table values(1);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from new_table;

ID
———-
1

 

I create a parameter file from spfile, change ASM_DISKGROUPS parameter to OCR,DATA (remove FLASH) and mount the ASM instance again using a parameter file with

ASM_DISKGROUPS=’OCR’,’DATA’ specified:

 

pic7

 

So DISK6 is member and as the disk is discovered by ASM instance, FLASH diskgroup is dismounted, but still there.

Let’s change the owner of the disk and check it again. But before checking the owner, let’s read it’s header by KFED:

[root@node1 disks]# kfed read DISK6

pic4

 

Now let’s start the instance and check V$ASM_DISKGROUP view:
pic5

 

Query X$KFGRP view:


 
SQL> select NAME_KFGRP from X$KFGRP;

NAME_KFGRP

——————————

 

DATA

OCR

SQL>

 

Switch to the database and check if you can query the table:

SQL> select * from new_table;

select * from new_table

*

ERROR at line 1:

ORA-01157: cannot identify/lock data file 8 – see DBWR trace file

ORA-01110: data file 8: ‘+FLASH/rac/datafile/new_tbs.257.837080939’

SQL>

 

 

Now shutdown the ASM instance, return the owner back and check V$ASM_DISKGROUP again:

[root@node1 disks]# chown -R oracle:dba DISK6
pic6

 

FLASH diskgroup appeared, however it’s not specified at ASM_DISKGROUPS parameter. Now mount the diskgroup and query the table again:

SQL> alter diskgroup flash mount;

Diskgroup altered.

SQL>

SQL> select * from new_table;

ID
———-
1

SQL>

 

This means that if you want to move the ASM instance to another host, it’s enough to specify ASM_DISKSTRING parameter, V$ASM_DISKGROUP will discover all diskgroups

Posted in Administration, RAC issues | 2 Comments »