Kamran Agayev's Oracle Blog

Oracle Certified Master

Archive for the 'RAC issues' Category

Real Application Clusters

[INS-20802] Creating Container Database for Oracle Grid Infrastructure Management Repository failed

Posted by Kamran Agayev A. on 24th July 2017

After dealing with root.sh script to configure 3 node clusterware environment I succeeded but ended up with the following error when post configuration OUI returned the following error and was unable to create container database for  Oracle Grid Infrastructure Management Repository:

screenshot

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

There was no information in the mentioned log file, however the trace of database creation job was enabled and I was able to find a long trace file under the log directory, where I saw the following message:

set newname for datafile 1 to new;

set newname for datafile 3 to new;

set newname for datafile 4 to new;

restore datafile 1;

restore datafile 3;

restore datafile 4; }
[Thread-159] [ 2017-07-24 04:24:07.299 EDT ] [RMANEngine.executeImpl:1321] Notify reader to start reading
[Thread-177] [ 2017-07-24 04:24:07.300 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=echo set off
[Thread-177] [ 2017-07-24 04:24:07.300 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=
[Thread-177] [ 2017-07-24 04:24:07.305 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=RMAN> 2> 3> 4> 5> 6> 7> 8> 9> 10> 11> 12> 13>
[Thread-177] [ 2017-07-24 04:24:07.305 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=executing command: SET NEWNAME
[Thread-177] [ 2017-07-24 04:24:07.546 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=
[Thread-177] [ 2017-07-24 04:24:07.547 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=executing command: SET NEWNAME
[Thread-177] [ 2017-07-24 04:24:07.562 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=
[Thread-177] [ 2017-07-24 04:24:07.563 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=executing command: SET NEWNAME
[Thread-177] [ 2017-07-24 04:24:07.578 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=
[Thread-177] [ 2017-07-24 04:24:07.585 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=Starting restore at 24-JUL-17
[Thread-177] [ 2017-07-24 04:24:07.792 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=allocated channel: ORA_DISK_1
[Thread-177] [ 2017-07-24 04:24:07.797 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=channel ORA_DISK_1: SID=18 device type=DISK
[Thread-177] [ 2017-07-24 04:24:08.051 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=
[Thread-177] [ 2017-07-24 04:24:08.383 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=channel ORA_DISK_1: starting datafile backup set restore
[Thread-177] [ 2017-07-24 04:24:08.385 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=channel ORA_DISK_1: specifying datafile(s) to restore from backup set
[Thread-177] [ 2017-07-24 04:24:08.386 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=channel ORA_DISK_1: restoring datafile 00001 to +DATA
[Thread-177] [ 2017-07-24 04:24:08.387 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=channel ORA_DISK_1: reading from backup piece /u01/app/12.2.0.1/grid/assistants/dbca/templates/MGMTSeed_Database.dfb
[Thread-177] [ 2017-07-24 04:24:23.487 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=RMAN-00571: ===========================================================
[Thread-177] [ 2017-07-24 04:24:23.487 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
[Thread-177] [ 2017-07-24 04:24:23.487 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=RMAN-00571: ===========================================================
[Thread-177] [ 2017-07-24 04:24:23.487 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=RMAN-03002: failure of restore command at 07/24/2017 04:24:23
[Thread-177] [ 2017-07-24 04:24:23.487 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=ORA-19870: error while restoring backup piece /u01/app/12.2.0.1/grid/assistants/dbca/templates/MGMTSeed_Database.dfb
[Thread-177] [ 2017-07-24 04:24:23.487 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=ORA-19872: Unexpected end of file at block 4800 while decompressing backup piece /u01/app/12.2.0.1/grid/assistants/dbca/templates/MGMTSeed_Database.dfb
[Thread-177] [ 2017-07-24 04:24:23.495 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=
[Thread-177] [ 2017-07-24 04:24:23.496 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=RMAN>
[Thread-177] [ 2017-07-24 04:24:23.496 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=echo set on
[Thread-177] [ 2017-07-24 04:24:23.504 EDT ] [RMANEngine.readSqlOutput:988] Log RMAN Output=set echo off;
[Thread-177] [ 2017-07-24 04:24:23.504 EDT ] [RMANEngine.readSqlOutput:1031] hasError is true
[Thread-177] [ 2017-07-24 04:24:23.504 EDT ] [RMANEngine.readSqlOutput:1037] ERROR TRACE DETECTED

 

So in ordreate a container database, the installer was trying to restore the database and was unable to do it and hit the following error:

ORA-19872: Unexpected end of file at block 4800 while decompressing backup piece /u01/app/12.2.0.1/grid/assistants/dbca/templates/MGMTSeed_Database.dfb

So the problem was with the backup piece of the MGMT database. Permission were ok, so I compared the size of the restored backup piece with the one in the downloaded zip file:

[root@oratest01 ~]# cd /u01/app/12.2.0.1/grid/assistants/dbca/templates/
[root@oratest01 templates]# ll
total 131452
-rw-r--r-- 1 oracle oinstall 5734 Jan 26 10:48 DomainServicesCluster_GIMR.dbc
-rw-r----- 1 oracle oinstall 18628608 Jan 26 10:46 MGMTSeed_Database.ctl
-rw-r----- 1 oracle oinstall 5177 Jan 26 10:48 MGMTSeed_Database.dbc
-rw-r----- 1 oracle oinstall 39321600 Jan 26 10:46 MGMTSeed_Database.dfb
-rw-r----- 1 oracle oinstall 10578 Jun 10 2016 New_Database.dbt
-rw-r----- 1 oracle oinstall 76619776 Jan 26 10:11 pdbseed.dfb
-rw-r----- 1 oracle oinstall 6579 Jan 26 10:11 pdbseed.xml

 

It was 39M in the extracted folder, and 104Mb in the zip file itself. Screenshot2

 

 

 

 

 

Somehow it was not correctly unzipped. I moved all files to the backup folder, uploaded backup pieces from the downloaded installation zip file to the same folder in the first node and restarted the configuration – and it succeeded.

Screenshot3

 

 

 

 

 

 

 

 

 

 

 

 

Good Luck!

Posted in RAC issues | No Comments »

Perl related issues when running ./rootcrs.pl to deconfigure the node

Posted by Kamran Agayev A. on 24th July 2017

Today while deconfiguring one failed node from the clusterware I faced some Perl related issues that blocked me to run ./rootcrs.pl command. After installing few required packages I was able to deconfigure the node. Check the steps and let me know if it helped you and if you had different errors

 

[root@oratest02 install]$ ./rootcrs.pl -deconfig -force -verbose
-bash: ./rootcrs.pl: /usr/bin/perl: bad interpreter: No such file or directory

 

I checked for perl and didn’t find it.
[root@oratest02 install]$ which perl
/usr/bin/which: no perl in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/oracle/.local/bin:/home/oracle/bin:/home/oracle/.local/bin:/home/oracle/bin:/u01/app/12.2.0.1/grid/bin)

If the perl wasn’t installed by default, install it:

[root@oratest02 install]# yum install perl -y

Then I got the following errors and installed the required perl modules as follows:

[root@oratest02 install]$ ./rootcrs.pl -deconfig -force -verbose
Can't locate Env.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . . ./../../perl/lib) at crsinstall.pm line 286.
BEGIN failed--compilation aborted at crsinstall.pm line 286.
Compilation failed in require at ./rootcrs.pl line 165.
BEGIN failed--compilation aborted at ./rootcrs.pl line 165.


[root@oratest02 install]# yum install perl-Env -y


[root@oratest02 install]$ ./rootcrs.pl -deconfig -force -verbose
Can't locate XML/Parser.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . . ./../../perl/lib) at crsutils.pm line 770.
BEGIN failed--compilation aborted at crsutils.pm line 770.
Compilation failed in require at crsinstall.pm line 290.
BEGIN failed--compilation aborted at crsinstall.pm line 290.
Compilation failed in require at ./rootcrs.pl line 165.
BEGIN failed--compilation aborted at ./rootcrs.pl line 165.
[root@oratest02 install]# yum install perl-XML-Parser -y

Finally I was able to run rootcrs.pl and deconfigure the node from the clusterware

Posted in RAC issues | No Comments »

Interim patch apply best practices in Oracle

Posted by Kamran Agayev A. on 2nd October 2015

Yestarday, after successfully applying an interim patch to the 3 node clusterware environment, I decided to share my experience with you. In this blog post, you can find some best practicec that I think must be followed before and during patch insallation to bring the downtime and failure risks to the minimum.

First of all, make sure you read the following metalink notes:

Master Note For OPatch (Doc ID 293369.1)
FAQ: OPatch/Patch Questions/Issues for Oracle Clusterware (Grid Infrastructure or CRS) and RAC Environments (Doc ID 1339140.1) 

 

Before applying any interim patches or upgrading the database or the clusterware, make sure you have answers to the following questions:

– Have you tested the patch installation? 

– Have you tested rollback of the patch? 

– What you will do if you can’t rollback the patch with default rollback mechanism? 

– What you will do if you fail to open the database after the patch installation? 

Do you have a backup? Have you tested it? What if you don’t have enough time to restore? 

 

Here is the list of what I would prefer to do before applying any interim patches to the mission critical environment:

– Backup the home folder that is going to be patched

tar -cvf grid_home_before_patch.tar /home/oracle/app/11.2.0

If the patch installation goes wrong and you can’t rollback the patch using default method, restore the backup of the installation home folder and bring the database (clusterware) up.

– Make sure you have a full backup of the database 

Most probably you will not go with the restoration, but you never know what might happen)

– Make sure your backup is recoverable

Yes. This might be a discussion topic, but I strongly believe (as an author of the RMAN Backup and Recovery book :) ) in “If you don’t test your backup, you don’t have a backup” philosophy. Restore it and make sure a) The backup is restorable/recoverable b) Your restore/recover scripts works fine. In my experience, I had a situation where the restore of the backup failed while restoring it for the developers for the testing purpose because of non tested recovery scripts. I heard a situation (a couple of years ago when I attended a wonderful OOW session) where one of the attendees complained how they failed to restore a backup when the production environment failed and this downtime (for days) costed them for a couple of million dollars.

– Make sure you have a Standby database.

Why? Imagine you took 30 minutes downtime to apply the patch and for any reason you were not able to do it and can’t rollback because you are in the middle of the patch apply procedure and trying to fix the issue. Or you can’t rollback the patch for any reason. You stuck! And you don’t have time to solve it. And you are forced to open the database right away. And you can’t do it as well. In this critical case, you can forward the applications to the Standby database. Build up a standby database, make sure archived log files of the production database are shipping to the standby server. You can also perform a failover to test your standby database and build it up again.

– Test the patch apply procedure on the test environment with the “same binaries”. 

Clone the database and clusterware soft to the test machine (or install the same release and apply the same patches as in the production environment) and apply the patch. Get the errors in the test environment before you get them in the production.

– Make sure there’s no any session runinng in the background related with the binaries of the home that is being patched. 

Yesterday, when I was trying to apply an interim patch to the 3 node clusterware (11.2.4) I came up with the following error:

Backing up files…

UtilSession failed: Map failed
java.lang.OutOfMemoryError: Map failed
Log file location: /home/oracle/11.2.0/grid_1124/cfgtoollogs/opatch/opatch2015-10-01_17-40-56PM_1.log

OPatch failed with error code 73

The reason was not the memory at all, we were having a lot of free memory at that time. There were some binary files in use in the background despite the fact that the whole clusterware stack was down. Anyway, I killed all processes and the installation proceeded.

The following metalink note also might be useful – OPatch Apply/Rollback Errors: ‘Prerequisite check “CheckActiveFilesAndExecutables” failed’ (Doc ID 747049.1)

– Download the latest OPatch

Check the following metalink note to download the latest OPatch. How To Download And Install The Latest OPatch Version (Doc ID 274526.1)

Download and extract it under the home folder that is going to be patched. Add the $GRID_HOME/OPatch or $ORACLE_HOME/OPatch to the PATH environment variable. Make sure which opatch command returns you a result

– Make sure you have enough free space in the mount point where the home folder reside.

A few years ago when I was trying to install a patch to the production environment, I decided to try it on the test environment (10 minutes before patching a production database). I ended up with the “There is no free space to proceed the installation” error. Home folder was full, and OPatch was taking backup of the binaries and library files that are being patched.  Check the following metalink note for more informaton: Opatch Fails Updating Archives with ” No space left on device ” Error. (Doc ID 1629444.1)

 – Bring the instance down before patching

If you have a Grid Infrastructure installed, you have a RAC database and you plan to apply the patch node by node without a downtime, bring the instance of the node you’re patching using the following command:

srvctl stop instance -d RACDB -i RACDB1

Why? Because if you start installing the patch and run the rootcrs.pl -unlock command which is the first step that brings the clusterware stack down, the database will be closed with ABORT mode and non of the sessions will be failed over.

– Try to rollback the patch installation at test environment after installing it

Why? Feel how you should rollback (and see if you get any error) the specific patch if you failed the installation and can’t proceed, or you installed successfully, but it caused another bug or problem. Check the following metalink note to learn how to rollback the patch and run opatch lsinventory to make sure it is rollbacked.  How to Rollback a Failed Interim Patch Installation (Doc ID 312767.1)

Sometimes, rollback also might fail :) In this case, the best option is to restore the whole home folder from the backup, but it is not mentioned in ths metalink note OPatch Fails to Rollback Patch Due to Relink Errors (Doc ID 1534583.1)

– Debug the OPatch if it is stuck 

You can use OPATCH_DEBUG=TRUE parameter to debug the OPatch. If it doesn’t generate enough information, use truss (or strace in Liunx) to debug OPatch. Check the following metalink note to learn how to use truss with OPatch. How To Use Truss With OPatch? (Doc ID 470225.1) 

Opatch might also hang due to corrupted jar and java executables. Check this metalink note – opatch napply Hanging (Doc ID 1055397.1)

 

This is all I have :) Please let me know if this document helped you and share your experience with me :) Have a successfull patching days ahead! :)

Posted in Administration, RAC issues | 8 Comments »

ORA-00304: requested INSTANCE_NUMBER is busy

Posted by Kamran Agayev A. on 27th August 2015

There are a lot of explanation and different solutions for the error “ORA-00304: requested INSTANCE_NUMBER is busy”. But today, in my case while I was tyring to shutdown one of the cluster nodes, it hanged. There were no more information related with the hang in the log and trace files, so I went with shut abort and startup and got the following message:

SQL> startup

ORA-00304: requested INSTANCE_NUMBER is busy

SQL>

The second node of the RAC database was up and running. And the instance_number was set to 2. After a little investigation, I found out that there was one process related with the database running on OS (even the database was closed) I killed that session and started the first node and it opened successfully

Posted in Administration, RAC issues | 3 Comments »

Default listener “LISTENER” is not configured when running DBCA

Posted by Kamran Agayev A. on 6th January 2015

When running dbca to create a new database you can get the following message:

Default Listener “LISTENER” is not configured in Grid Infrastructure home. Use NetCA to configure Default Listener and rerun DBCA

default_listener_problem

 

 

 

 

 

 

 

 

 

 

 

 

Actually, there’s no need to run netca, all you need is to create a new listener as follows:

srvctl add listener

srvctl start listener

 

Posted in RAC issues | No Comments »

Node names are missing from ./runInstaller output

Posted by Kamran Agayev A. on 4th January 2015

While installing Oracle Database after Oracle Grid Infrastructure installation, I was supposed to get the list of all nodes where I need to install Oracle Software (11gR2 – 11.2.0.4). But instead, I got nothing

runInstaller_output

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I checked the status of the clusterware, it was up and running on both nodes:

[oracle@node1 bin]$ ./olsnodes
node1
node2
[oracle@node1 bin]$ ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

 

Then I checked the inventor.xml file and found out that the CRS=true is missing.

[oracle@node1 bin]$ cat /etc/oraInst.loc | grep inventory_loc
inventory_loc=/u01/app/oraInventory

[oracle@node1 bin] cd /u01/app/oraInventory/ContentsXML/

[oracle@node1 bin] more inventory.xml

<output trimmed ————— >

<HOME NAME=”Ora11g_gridinfrahome1″ LOC=”/u01/app/product/11.2.0.3/grid” TYPE=”O” IDX=”1″>

</output trimmed ————->

After running the following command, I updated the inventory.xml file and node list appeared

[oracle@node1 ~]$ cd /u01/app/product/11.2.0.3/grid/oui/bin/
[oracle@node1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=”/u01/app/product/11.2.0.3/grid” CRS=true
Starting Oracle Universal Installer…

Checking swap space: must be greater than 500 MB. Actual 3919 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
‘UpdateNodeList’ was successful.

 

[oracle@node1 bin] more inventory.xml

<HOME NAME=”Ora11g_gridinfrahome1″ LOC=”/u01/app/product/11.2.0.3/grid” TYPE=”O” IDX=”1″ CRS=”true”>

 

runInstaller_output2

Posted in RAC issues | No Comments »

Struggling with RAC Installation – ORA-15018: diskgroup cannot be created

Posted by Kamran Agayev A. on 9th December 2014

I said it before. It was only once that I succeeded to install Oracle Clusterware without any issues and that was during OCM exam :) I didn’t hit any bug, I didn’t re-configured anything. The installation went smooth. But …

Today, I got all following errors :) :

ORA-15032: not all alterations performed
ORA-15131: block of file in diskgroup could not be read
ORA-15018: diskgroup cannot be created
ORA-15031: disk specification ‘/dev/mapper/mpathh’ matches no disks
ORA-15025: could not open disk “/dev/mapper/mpathh”
ORA-15056: additional error message
ORA-15017: diskgroup “OCR_MIRROR” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “OCR_MIRROR”
ORA-15033: disk ‘/dev/mapper/mpathh’ belongs to diskgroup “OCR_MIRROR”

In the beginning, while installing Oracle 11gRAC, I got the following error:

CRS-2672: Attempting to start ‘ora.diskmon’ on ‘vsme_ora1’

CRS-2676: Start of ‘ora.diskmon’ on ‘vsme_ora1’ succeeded

CRS-2676: Start of ‘ora.cssd’ on ‘vsme_ora1’ succeeded

 

Disk Group OCR_MIRROR creation failed with the following message:

ORA-15018: diskgroup cannot be created

ORA-15031: disk specification ‘/dev/mapper/mpathh’ matches no disks

ORA-15025: could not open disk “/dev/mapper/mpathh”

ORA-15056: additional error message

 

 

Configuration of ASM … failed

see asmca logs at /home/oracle/app/cfgtoollogs/asmca for details

Did not succssfully configure and start ASM at /home/oracle/11.2.4/grid1/crs/install/crsconfig_lib.pm line 6912.

/home/oracle/11.2.4/grid1/perl/bin/perl -I/home/oracle/11.2.4/grid1/perl/lib -I/home/oracle/11.2.4/grid1/crs/install /home/oracle/11.2.4/grid1/crs/install/rootcrs.pl execution failed

 

Bad news is that the installation failed. Good news is that I can easily restart the installation again without any issues, as the root.sh script is rest restartable. If you don’t need to install the software on all nodes again, solve the problem and run root.sh script again. If the problem is solved, it will sun smoothly. If you need to install the software on all nodes, you have to deconfigure and run the installation again. To remove the failed RAC installation, run rootcrs.pl script on all nodes except the last one, as follows:

$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig –force

 

Run the following command on the last node:

$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig –force –lastnode

 

Now, run ./runInstaller command and start the installation again.

 

So let’s go back to the problem. It was claiming that “disk specification ‘/dev/mapper/mpathh’ matches no disks”. Hmm … The first thing that came in my mind was permission of the disk. So I checked it, it was root:disk. I changed it to oracle:dba and run root.sh script. Got the same problem again.

I checked the following log file:

/home/oracle/app/cfgtoollogs/asmca

 

[main] [ 2014-12-09 17:26:29.220 AZT ] [UsmcaLogger.logInfo:143]  CREATE DISKGROUP SQL: CREATE DISKGROUP OCR_MIRROR EXTERNAL REDUNDANCY  DISK ‘/dev/mapper/mpathh’ ATTRIBUTE ‘compatible.asm’=’11.2.0.0.0′,’au_size’=’1M’

[main] [ 2014-12-09 17:26:29.295 AZT ] [SQLEngine.done:2189]  Done called

[main] [ 2014-12-09 17:26:29.296 AZT ] [UsmcaLogger.logException:173]  SEVERE:method oracle.sysman.assistants.usmca.backend.USMDiskG

roupManager:createDiskGroups

[main] [ 2014-12-09 17:26:29.296 AZT ] [UsmcaLogger.logException:174]  ORA-15018: diskgroup cannot be created

ORA-15031: disk specification ‘/dev/mapper/mpathh’ matches no disks

ORA-15025: could not open disk “/dev/mapper/mpathh”

ORA-15056: additional error message

 

Oracle  wasn’t able to create the diskgroup claiming that the specified device matches no disks. I logged in to the ASM instance and tried to create the diskgroup by my own:

SQL> CREATE DISKGROUP OCR_MIRROR EXTERNAL REDUNDANCY  DISK ‘/dev/mapper/mpathh’ ATTRIBUTE ‘compatible.asm’=’11.2.0.0.0′,’au_size’=’1M’;

 

SQL> CREATE DISKGROUP OCR_MIRROR EXTERNAL REDUNDANCY  DISK ‘/dev/mapper/mpathh’ ATTRIBUTE ‘compatible.asm’=’11.2.0.0.0′,’au_size’=’1M’

ERROR at line 1:

ORA-15018: diskgroup cannot be created

ORA-15031: disk specification ‘/dev/mapper/mpathh’ matches no disks

ORA-15025: could not open disk “/dev/mapper/mpathh”

ORA-15056: additional error message

Linux-x86_64 Error: 13: Permission denied

Additional information: 42

Additional information: -807671168

 

I checked the permission, it was root:disk . I changed it to oracle:dba and run the command again.

SQL> CREATE DISKGROUP OCR_MIRROR EXTERNAL REDUNDANCY  DISK ‘/dev/mapper/mpathh’ ATTRIBUTE ‘compatible.asm’=’11.2.0.0.0′,’au_size’=’1M’

ERROR at line 1:

ORA-15018: diskgroup cannot be created

ORA-15017: diskgroup “OCR_MIRROR” cannot be mounted

ORA-15063: ASM discovered an insufficient number of disks for diskgroup “OCR_MIRROR”

 

I run the query again, this time got different message:

SQL> CREATE DISKGROUP OCR_MIRROR EXTERNAL REDUNDANCY  DISK ‘/dev/mapper/mpathh’ ATTRIBUTE ‘compatible.asm’=’11.2.0.0.0′,’au_size’=’1M’

ERROR at line 1:

ORA-15018: diskgroup cannot be created

ORA-15033: disk ‘/dev/mapper/mpathh’ belongs to diskgroup “OCR_MIRROR”

 

 

I tried to mount the diskgroup and got the following error:

SQL> alter diskgroup ocr_mirror mount;

alter diskgroup ocr_mirror mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15017: diskgroup “OCR_MIRROR” cannot be mounted

ORA-15063: ASM discovered an insufficient number of disks for diskgroup “OCR_MIRROR”

 

I checked the permission. It was changed again! I changed it back to oracle:dba and tried to mount the diskgroup and got the following error!

SQL> alter diskgroup ocr_mirror mount

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15131: block  of file  in diskgroup  could not be read

 

Ohhh … Come on! I logged to the ASM instance, and queried the v$asm_disk and v$asm_diskgroup views.

SQL> select count(1) from v$asm_disk;

   COUNT(1)

———-

         0

 

I changed permission to oracle:dba and run the query again:

SQL> /

  COUNT(1)

———-

         1

 

I run again:

 

SQL> select count(1) from v$asm_diskgroup;

   COUNT(1)

———-

         0

 

What??? The permission is changed automatically while I query V$ASM_DISKGROUP view? Yes … Even, when you query V$ASM_DISKGROUP, Oracle checks ASM_DISKSTRING parameter and query the header of all disks that are listed in that parameter. For more information on this topic, you can check my following blog post:

V$ASM_DISKGROUP displays information from the header of ASM disks

So, this means that when I query V$ASM_DISK view, Oracle scan the disk (with the process that runs under root user) and change the permission of the disk.

After making change to the /etc/udev/rules.d/99-oracle-asmdevices.rules file and adding the following line, the problem solved:

NAME=”/dev/mapper/mpathh”, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

 

So I checked the permission of the disks again after querying V$ASM_DISK multiple time, and made sure that it doesn’t change the permission of the disk and run root.sh script. Everything worked fine and I got the following output:

ASM created and started successfully.

Disk Group OCR_MIRROR mounted successfully.

clscfg: -install mode specified

Successfully accumulated necessary OCR keys.

Creating OCR keys for user ‘root’, privgrp ‘root’..

Operation successful.

CRS-4256: Updating the profile

Successful addition of voting disk 5feed4cb66df4f43bf334c3a8d73af92.

Successfully replaced voting disk group with +OCR_MIRROR.

CRS-4256: Updating the profile

CRS-4266: Voting file(s) successfully replaced

##  STATE    File Universal Id                File Name Disk group

—  —–    —————–                ——— ———

 1. ONLINE   5feed4cb66df4f43bf334c3a8d73af92 (/dev/mapper/mpathh) [OCR_MIRROR]

Located 1 voting disk(s).

CRS-2672: Attempting to start ‘ora.asm’ on ‘vsme_ora1’

CRS-2676: Start of ‘ora.asm’ on ‘vsme_ora1’ succeeded

CRS-2672: Attempting to start ‘ora.OCR_MIRROR.dg’ on ‘vsme_ora1’

CRS-2676: Start of ‘ora.OCR_MIRROR.dg’ on ‘vsme_ora1’ succeeded

Preparing packages for installation…

cvuqdisk-1.0.9-1

Configure Oracle Grid Infrastructure for a Cluster … succeeded

 

 

Posted in RAC issues | 1 Comment »

Getting ORA-01105 during RAC db startup

Posted by Kamran Agayev A. on 30th July 2014

Today, while starting RAC instances of 2 node RAC database (10gR2 on Linux), I got the following error in the first node:

ORA-01105: mount is incompatible with mounts by other instances
ORA-01677: standby file name convert parameters differ from other instance

 

I checked the alert.log file, but there was no enough information to solve this issue:

Wed Jul 30 09:58:48 AZST 2014
Setting recovery target incarnation to 2
ORA-1105 signalled during: ALTER DATABASE MOUNT…
Wed Jul 30 09:58:58 AZST 2014
SUCCESS: diskgroup DATA was dismounted

 

After playing with some initialization parameters, I found a metalink note where it was defined as a bug (bug13001004)

Check out the following metalink note:

Spfile defined in OCR is not used if one exists in $ORACLE_HOME/dbs (Doc ID 1373622.1)

 

The solution is – to move parameter file to the centralized directory (/ocfs) and remove any instance_name parameter

Posted in Administration, RAC issues | No Comments »

Using odd number of disks for Voting disk

Posted by Kamran Agayev A. on 29th May 2014

As you’ve already known, you should use odd number of disks for voting disk. A node must be able to strictly access more than half of the voting disks at any time. Let me show you how it works. I have installed and configured two node 11gR3 RAC on VirtualBox and use the following case to show how it works:

– Create a diskgroup with 3 failure groups and 3 different disks

– Move voting disk to the new diskgroup. Shutdown the second node and deattach one of the disks. In this case, cluster should start as it can access more than half of the voting disks (2 from 3)

– Start the second node. The cluster should be up. Shut the second node again and deattach the second voting disk. And start it. The cluster will not start. Check the ocssd.log file

– Shut down all node, attach the previous disks and start it again. Cluster will be up

Here’re the detailed steps:

– Create a diskgroup :

Pic1

– Mount the diskgroup at the second node:

SQL> ALTER DISKGROUP vdisk MOUNT;

 

– Replace voting disk, move it to the new diskgroup and query the voting disk:

– Pic2

 

 

 

 

 

 

 

 

– Shutdown the second instance and reattach one of the disks of VDISK diskgroup :

Pic3

 

 

 

 

 

 

 

 

 

 

– Star the second node, query the Voting disk and check if the clusterware is up:

Pic4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

– Shutdown the second node again, remove the second disk from the Voting diskgroup and start the node:

pic5

 

 

 

 

 

 

– Check the log file at $GRID_HOME/log/node2/cssd/ocssd.log :

2014-05-29 01:51:23.055: [ CSSD][2946955008]clssnmvVerifyCommittedConfigVFs: Insufficient voting files found, found 1 of 3 configured, needed 2 voting files
2014-05-29 01:51:23.055: [ CSSD][2946955008](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 0, id 279c162c-1b964f88-bfb1d622-aecc9e4e not found
2014-05-29 01:51:23.055: [ CSSD][2946955008](:CSSNM00020:)clssnmvVerifyCommittedConfigVFs: voting file 1, id 7e282f3f-5e514f42-bfb79396-c69fda76 not found
2014-05-29 01:51:23.055: [ CSSD][2946955008](:CSSNM00021:)clssnmCompleteVFDiscovery: Found 1 voting files, but 2 are required. Terminating due to insufficient configured voting files

– As you see, cluster is down. Now, shutdown both nodes, add disks to the second node and check the status of the clusterware:

Pic6

 

 

 

 

 

 

 

 

 

Posted in RAC issues | No Comments »

How to troubleshoot CRSCTL REPLACE VOTEDISK error?

Posted by Kamran Agayev A. on 27th May 2014

It took me some time to investigate why CRSCTL REPLACE VOTEDISK command is not working.
[oracle@node1 ~]$ crsctl replace votedisk VDISK
CRS-4264: The operation could not be validated
CRS-4000: Command Replace failed, or completed with errors.
When you get an error during VOTEDISK replacement, make sure you check the following items:

– Make sure the disk group you’re moving the voting disk is mounted on all nodes.

– Make sure the compatibility parameter is set to the version of Grid software you’re using. You can change it using the following command:

alter diskgroup VDISK set attribute ‘compatible.asm’=’11.2’;

Query V$ASM_DISKGROUP view to make sure it’s the same with the rest disk groups and with the version of the Grid Software:

select group_number, name, compatibility, database_compatibility from v$asm_diskgroup;

– Check alert.log file of an ASM instance, any available trace file of the ASM instance. Check /var/log/messages file and trace the replace command usint strace file. See if you can catch any error from the log file:

[grid@node5 ~]strace crsctl replace votedisk VDISK
– Make sure you’ve an odd number of votedisk

– Make sure there’s enough space in the diskgroup

– Make sure disk permissions is correct

– Make sure you’re running the command using Grid Software owner

Today, all above mentioned checks are failed :). In my case, the problem was using incorrect “crsctl” command. After upgrading the RAC environment from 11.2.0 to 11.2.3 I was still using old crsctl (by accident, forgot to set environment variables correctly). But no need to worries, it was a test database.

Let me know if you have any additional check to investigate voting disk replace failure

Cheers

Posted in RAC issues | 4 Comments »