Kamran Agayev's Oracle Blog

Oracle Certified Master

Solution for ORA-27154: post/wait create failed ; ORA-27302: failure occurred at: sskgpbitsper

Posted by Kamran Agayev A. on June 21st, 2019

Today, while creating an empty database in Exadata machine where there was enough free space and memory, we got the following error:

SYS@TEST> startup nomount
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpbitsper

 

The problem wasn’t related with the space at all, even from the error message we see “No space left on device”.

From the error output, I realized “OS system dependent operation:semget“, where “sem” means “semaphore“. Having enough free memory and space, the process couldn’t allocate necessary semaphore, either because of the kernel parameter wasn’t configured correctly, or all memory is occupied. To get information about semaphores and shared memory, I ran ipcs command:

[oracle@node2~]$ ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status 
0x00000000 0 root 644 64 2 dest 
0x00000000 32769 root 644 16384 2 dest 
0x00000000 65538 root 644 280 2 dest 
0x00000000 98307 root 644 80 2 
0x00000000 131076 root 644 16384 2 
0x00000000 163845 root 644 280 2 
0x00000000 262602758 oracle 640 4096 0
------ Semaphore Arrays --------
key semid owner perms nsems 
0x61000625 98306 root 666 1 
0x00000000 163844 root 666 3 
0x00000000 1769477 root 666 3 
0x00000000 4096006 root 666 3 
0xd9942a14 3604487 oracle 600 514 
0xd9942a15 3637256 oracle 600 514 
0xd9942a16 3670025 oracle 600 514 
0x192b36e8 219578379 oracle 640 1004 
0x5f94bc50 6062092 oracle 640 1004 
0x00000000 286752781 root 666 3 
0xaa3762f4 6324238 oracle 640 154

 

The list was long, so I decided to count the rows

[oracle@node2 ~]$ ipcs -s | wc -l
256

 

So overall I have 256 semaphores allocated. Then I checked /etc/sysctl.conf file for the KERNEL.SEM parameter:

[oracle@node2 ~]$ more /etc/sysctl.conf | grep sem
kernel.sem = 1024 60000 1024 256
[oracle@node2 ~]$

You can get more detailed output from ipcs -ls command as follows:

 

[oracle@node2 ~]$ ipcs -ls
------ Semaphore Limits --------
max number of arrays = 256
max semaphores per array = 1024
max semaphores system wide = 60000
max ops per semop call = 1024
semaphore max value = 32767

 

The last column indicates the maximum number of semaphore sets for the entire OS. In this case you have to options to solve the problem:

  • Increase the max number of arrays parameter in the /etc/sysctl.conf file
  • Remove unnecessary semaphores

 

Increasing max number of arrays parameter is the easiest (and the fastest) way. Here how it works:

 

1. Get the value for the SEM parameter:

[root@node2 ~]# cat /etc/sysctl.conf | grep sem
kernel.sem = 1024 60000 1024 256
[root@node2 ~]#

2. Edit it and change it to 260 (more than the value you get from ” ipcs -s | wc -l” command) and run the following command to set the parameter to be persistent

/sbin/sysctl -p

3. Create a dummy parameter file and start the instance in NOMOUNT mode to see if the oracle user can get a semaphore from the memory:

[oracle@node2 dbs] mode initTEST.ora
db_name=TEST
sga_size=2g

 

[oracle@node2 ~] export ORACLE_SID=TEST
[oracle@node2 ~] sqlplus / as sysdba
SYS@TEST> startup nomount
ORACLE instance started.
Total System Global Area 2137886720 bytes
Fixed Size 2254952 bytes
Variable Size 956303256 bytes
Database Buffers 1090519040 bytes
Redo Buffers 88809472 bytes
SYS@TEST>

 

It worked!

 

The second option to solve the problem, is to find out the ‘aged’ semaphores from the memory and remove them. Each semaphore is linked to the PID in the OS. In the following example I have overall 256 semaphores where 23 of them are related with oracle user (db instances etc.) and 229 of them related with root user. Most processes that hold the semaphore in the memory died long time ago, but semaphores didn’t age out. To find and kill the PID of the semaphore, we run ipcs command with -i parameter. First let’s get list of semaphores under oracle user and check one of them as follows:

[root@node2 ~]# ipcs -s | grep oracle 
0xcfe88130 3473414 oracle 600 514 
0xcfe88131 3506183 oracle 600 514 
0xcfe88132 3538952 oracle 600 514 
0xf0720010 411041803 oracle 640 802 
0xf8121f34 145653772 oracle 640 1004 
0xf0720011 411074573 oracle 640 802 
0xf0720012 411107342 oracle 640 802 
0xc5d91710 196444189 oracle 640 504 
0x86d48ae8 44236836 oracle 640 304 
0x67556608 199786542 oracle 640 876 
0x67556609 199819311 oracle 640 876 
0x6755660a 199852080 oracle 640 876 
0x6755660b 199884849 oracle 640 876 
0x6755660c 199917618 oracle 640 876 
0x806b87cc 157450352 oracle 640 752 
0x806b87cd 157483121 oracle 640 752 
0x806b87ce 157515892 oracle 640 752 
[root@node2 ~]#

 

Next, we run ipcs command with -i parameter to get the list of PIDs as follows:

[root@node2 ~]# ipcs -s -i 157450352 | more

Semaphore Array semid=157450352
uid=1001 gid=1002 cuid=1001 cgid=1002
mode=0640, access_perms=0640
nsems = 752
otime = Fri Jun 21 18:51:23 2019 
ctime = Fri Jun 21 18:51:23 2019 
semnum value ncount zcount pid 
0 1 0 0 315611 
1 4893 0 0 315611 
2 10236 0 0 315611 
3 32760 0 0 315611 
4 0 0 0 0 
5 0 0 0 0 
6 0 0 0 315729 
7 0 1 0 315731 
8 0 0 0 0 
9 0 1 0 315739 
10 0 0 0 0 
11 0 1 0 315743 
12 0 0 0 315745 
13 0 1 0 315747 
14 0 1 0 315749

 

Next, we run ps command and check the PID:

[root@node2 ~]# ps -fp 315729
UID PID PPID C STIME TTY TIME CMD
oracle 315729 1 0 2018 ? 01:14:13 ora_pmon_SNEWDB
[root@node2 ~]#

 

As you see, we found out that the specific semaphore is associated with the database instance. Now let’s repeat the same steps for the semaphores of the root user:

[oracle@node2 ~]$ ipcs -s |grep root
0x61000625 98306 root 666 1
0x00000000 163844 root 666 3
0x00000000 1769477 root 666 3
0x00000000 4096006 root 666 3
0x00000000 248774666 root 666 3
0x00000000 286752781 root 666 3
0x00000000 6357007 root 666 3

 

Now we run ipcs -s -i command for the semaphore which is marked in bold to find the PID :

[oracle@node2 ~]$ ipcs -s -i 248774666
Semaphore Array semid=248774666
uid=0 gid=11140 cuid=0 cgid=11140
mode=0666, access_perms=0666
nsems = 3
otime = Sun Dec 16 18:34:22 2018
ctime = Sun Dec 16 18:34:22 2018
semnum value ncount zcount pid
0 1024 0 0 156155
1 32000 0 0 156155
2 0 0 0 156155

 

If we check the PID in the system, we see that it’s not available:

[oracle@node2 ~]$ ps -fp 156155
UID PID PPID C STIME TTY TIME CMD
[oracle@node2 ~]$

 

Now we can safely remove that semaphore from the memory using ipcrm command in order to release space for new semaphores:

[root@node2 ~]# ipcrm -s 248774666
[root@node2 ~]#
Let's check if it was removed:
[root@node2 ~]# ipcrm -s 248774666
ipcrm: invalid id (248774666)
[root@node2 ~]#

 

As you see, we found out the semaphores which associated process is not available in the system, and removed it to make space for new semaphores. Now let’s start the instance:

SYS@TEST> startup nomount
ORACLE instance started.
Total System Global Area 2137886720 bytes
Fixed Size 2254952 bytes
Variable Size 956303256 bytes
Database Buffers 1090519040 bytes
Redo Buffers 88809472 bytes
SYS@TEST>

 

Posted in Administration | No Comments »

Connect to Oracle from Python – write your first Python script!

Posted by Kamran Agayev A. on June 3rd, 2019

Python is getting more popular nowadays, because it is reliable and efficient, it has great corporate sponsors, and because of it’s amazing libraries that helps you to save time during the initial development cycle.

It’s much more easy to connect to an Oracle Database from Python by using cx_Oracle module. To get more information about cx_Oracle module, check the following links:

https://oracle.github.io/python-cx_Oracle/ 

https://cx-oracle.readthedocs.io/en/latest/installation.html

 

In this blog post, I will show how to install Python and configure the environment and connect to the database.

First of all, make sure you’ve an internet connection and install Python with yum as follows:

yum install python

After python is installed, install easy_install on Linux in order to download and manage Python packages easily using the following command:

wget http://bootstrap.pypa.io/ez_setup.py -O -| sudo python

easy_install installation

Next install pip using easy_install as follows:

pip_installation

Now install cx_Oracle module using pip as follows:

install_cx_Oracle_using_pip

 

Now install Oracle instant client:

cd /etc/yum.repos.d
wget https://yum.oracle.com/public-yum-ol7.repo
yum install -y yum-utils
yum-config-manager --enable ol7_oracle_instantclient
yum list oracle-instantclient*

yum_list_oracle_instantclient

 

Now install Oracle instance client basic and sqlplus as follows:

yum_install_oracle_instantclient

 

After installing Oracle client, configure environment variables as follows:

vi .bashrc
export CLIENT_HOME=/usr/lib/oracle/18.3/client64
export LD_LIBRARY_PATH=$CLIENT_HOME/lib
export PATH=$PATH:$CLIENT_HOME/bin

 

run .basrhc file to set environment variables and write your first Python script as follows:

vi connect.py 
import cx_Oracle
con=cx_Oracle.connect('username/password@ip_address/service_name')
print con.version
con.close()

 

If we run this script, we will get Oracle Database version in the output:

[root@oratest ~]python connect.py
11.2.0.4.0
[root@oratest ~]

 

Now let’s use split function in Python and split the version into “Version, Release and Patchset” sections as follows:

import cx_Oracle
con=cx_Oracle.connect('username/password@ip_address/service_name')
ver=con.version.split(".")
print 'Version:', ver[0],'\nRelease:',ver[1],'\nPatchset:',ver[3]
con.close()

[root@oratest ~]python connect.py
Version: 11
Release: 2
Patchste: 4
[root@oratest ~]

 

Now let’s create a table in Oracle and write a simple python code to query and print all rows in the table:

SQL> create table test_table(id number, name varchar2(10));
Table created.
SQL> insert into test_table values(1,'Oracle DB');
1 row created.
SQL> insert into test_table values(2,'SQL');
1 row created.
SQL> insert into test_table values(3,'PL/SQL');
1 row created.
SQL>

 

Now create a python code to query the table:

import cx_Oracle
con=cx_Oracle.connect('username/password@ip_address/service_name')
cur=con.cursor()
cur.execute('select * from test_table order by 1')
for result in cur:
      print result
cur.close()
con.close()

 

[root@oratest ~]python connect.py
(1,'Oracle DB')
(2,'SQL')
(3,'PL/SQL')
[root@oratest ~]

Congratulations! You’ve installed/configured Python, connected to an Oracle database, queried the table and printed the output!

Posted in Administration | 3 Comments »

Second OCM exam is cleared. New book and online course are on the way

Posted by Kamran Agayev A. on April 23rd, 2019

2 months ago after a long preparation I decided to upgrade my OCM certification and registered for the exam in Shanghai. Few years ago when I cleared 10g OCM exam I started my preparations for the upgrade right away. I did a lot of research and practical hands-ons and then thought it would be great if I can collect everything what I have in a single book. It took almost 2 years for me to publish the book. Few months after the book was published, I started getting emails from the readers on how the book helped them during their preparations and was happy to see them passing the exam! Having a lot of different projects during those days, I didn’t manage to take the exam. And unfortunately 11g OCM 1 day exam was retired. It means that I was supposed to take another 2 days exam again! But it was ok. If this is the only option, then I have nothing to do.

I will not talk about how my travel was hard, but eventually the exam day has arrived. It was 9 sections (2 days) with lot of different practical tasks. I wouldn’t also like to go in more details regarding the questions and so on, but what I realized was that the book that I’ve published even before taking the  OCM 11g exam was covering almost everything that I had during the exam 😊 Reviewing topics directly from my book helped me to be confident during the exam.

Few weeks passed, and I got a happy email from Oracle – that I’ve passed the exam and became 2xOCM. Now it’s time for the third and last one )) And it means that I’ve already started my preparation with along the new book which will be published in a few months.

For those of you guys who want to clear the OCM 11g exam, believe it or not, my book covers almost all the topics. And after clearing the second OCM exam, I decided to start an online course and help you on your preparation individually. So keep tuned and I will announce the course information shortly 😊

OCM 11g Certificate

Posted in Uncategorized | 2 Comments »

PRCR-1079 : Failed to start resource oranode1-vip. CRS-2680 Clean failed. CRS-5804: Communication error with agent process

Posted by Kamran Agayev A. on April 15th, 2019

Last week we had a clusterware issue on one of the critical 3 node RAC environment. In the first node, network resource is restarted by ending up killing all sessions on that node abnormally. Oracle VIP that was running on that node failed over to the third node. The first node was up and running, but didn’t accept connections because it was trying to register the instance using LOCAL_LISTENER parameter where the oranode1-vip was specified that was not running on that node. We tried to relocate it back to the first node, but it failed because it couldn’t stop it. Everytime we tried to stop or relocate it, the cleaning process started and failed in a few minutes.

Neither support, nor us didn’t find any readable information in the clusterware log files. Despite the fact that there were 2 instance up and running, as load was so high, they were barely handle all connections. The ping succeeded to the oranode1-vip, but it wasn’t able to stop it even with force mode. We couldn’t able to start it as well, because it didn’t stop successfully and wasn’t able to clean up successfully. The status was “enabled” and “not running”, but ping was ok

db-bash-$ srvctl status vip -i oranode1-vip
VIP oranode1-vip is enabled 
VIP oranode1-vip is not running 
db-bash-$

From crsctl stat res command we could see that it’s OFFLINE and failed over to the node3

 

db-bash-$ crsctl stat res -t
oranode1-vip  1 OFFLINE UNKNOWN node03

 

And it failed when we tried to start it:

db-bash-$ srvctl start vip -i oranode1-vip   
PRCR-1079 : Failed to start resource oranode1-vip  
CRS-2680: Clean of 'oranode1-vip  ' on 'node03' failed 
CRS-5804: Communication error with agent process

 

We cleared socket files of the first node from /var/tmp/.oracle folder, restart the CRS and checked if it failed back, but it didn’t. Support asked us to stop the second node, clear the socket files and start it to see if something changed, but we didn’t do it, because the single node wouldn’t be able to handle all connections.

At the end, we checked the interface of virtual up on OS level, and found it on node03

db-bash-$ netstat -win
lan900:805 1500 #### #### 2481604 0 51 0 0

 

Instead of restarting the CRS of production database (which takes 10 minutes), we decided to bring that interface down using on OS level. For HP-UX, it’s ifconfig … down command

Before running this command on production environment, we tried it on the test environment and realized that the down parameter is not enough. We have to provide 0.0.0.0 ip address with along the down parameter to bring down that interface. So we run the following command to bring it down:

ifconfig lan900:805 0.0.0.0 down

And it disappeared from the list. Next, we started the vip using srvctl start vip command and it succeeded!

Lessons learned:

  • Perform all actions on the test environment (if you are not sure what can happen) before trying it on production environment
  • Don’t try to “restart” or “reboot” the instance, cluster or the node. Sometimes it just doesn’t solve your problem. Even after restart, the system can’t startup correctly (because of changed parameters, configurations and etc.)
  • In 24 hours, severity #1 SR was assigned to 6 different engineers. It takes a lot of time to gather log files, submit them and have it reviewed by Oracle engineer until his/her shift is changed. Sometimes you just don’t have time to get answer from Oracle, you have to do it by your own and take all risks. It requires an experience.

Posted in RAC issues | No Comments »

ODev Yathra Tour 2018 – discovering Incredible India

Posted by Kamran Agayev A. on August 10th, 2018

Last month, after long brainstorm, I decided to take my chance and accepted my participation at Indian Oracle ODev Yathra tour. Despite the fact that I’ve visited India (Hyderabad) 2 times in the past for the Sangam conferences, I wanted to discover India more and decided to take 4 cities out of 7.

For the Yathra Tour I submitted 2 papers:

The first one was about “8 ways to migrate your On-Premis database to Oracle Cloud” where I was talking about different ways to migrate the database based on the downtime and the migration requirements to the Oracle Cloud.

The second session “Create, configure and manage Disaster Recovery in Oracle Cloud for On-Premises database” was about creating, configuring and managing DR on the Oracle Cloud using different techniques as well as configuring high level database, backup and network security.

When the agenda was published, I got a lot of messages from the DBAs of the cities at which I was not supposed to participate – that they are looking forward to meet me. So I talked to Sai Ram, the organizer of the Yathra Tour and he managed to put me into the agenda of the rest cities and I accepted one of the hardest decisions of my life and took all cities. I was having (and still have) a lot of ongoing projects in my company and had health issues that were blocking me to travel a long distance for two weeks. But I decided to push my limits and go beyond it.

So, finally, the travel started. I took my first flight to Abu Dhabi, and from Abu Dhabi to Chennai. Landed in Chennai, took a cab to the hotel, have some rest and was in the lobby at 7.30 AM next morning. Yes, this time was the common checkout time from the hotel every day :) I met Oracle Fusion expert Basheer Khan, Machine Learning PM Sandesh, Exadata PM Gurmit in the morning and we had a breakfast together. Then we took a cab and went to the venue.

Chennai_1Chennai_2Chennai_3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

So the daily routine for the conference was 7.30AM checkout from the hotel, cab drive to the venue, registration, introduction speech of Sai and other AIOUG members, then delivering presentations, having launch (most of the time spicy Indian launch :) ), closing ceremony at 6.00PM, driving to the airport, flying to the next city, bunch of security checks and etc., driving to the hotel, check-in and off to bed at 1.00AM and then checkout at 7.30 AM and off to the next venue again. Scary, right? :)

The next city was Bengaluru. As we had one extra day there, I decided to have a lunch outside in a random restaurant. The place was near the hotel and I ordered biryani as always :) Although I asked for “less spicy” biryani, I was served with the spicy one. My tongue was burned out and I was hardly drinking the tea for the next 2 days :) But it was very delicious. In the evening I took a small trip to MG (Mahatma Gandhi) road. It was too crowded, fascinating place and I was hardly got rid of a man who was chasing me and trying to sell a chess for 1500 Rupes (which originally was for 600 Rupee) :) He didn’t know I train JiuJitsu :)

Bengaluru_1Bengaluru_2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The next city was Ahmedabad. And I was not the only person who was visiting this city for the first time. Actually none of us (mostly Indian speakers) visited Ahmedabad so far :) The roads of this city were wide, and I was told that the Ahmadabad guys are coolest guys in India )) My session was after the launch and I managed to sleep a little bit more and attended the venue later. But unfortunately didn’t manage to visit the barber in the open air whom I was filming with curiosity. He yelled me with his hand and invited me to try his service, but I was late to my session

 

Ahmedabad_1 Ahmedabad_2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Next city was Hyderabad and the airport was very familiar to me. I already visited Hyderabad 2 times before. Again, was fortunate to have only one session after the launch and attended the venue a little bit later, met lot of friends that I met in my previous visits and all of us were off to the airport right after the conference.

 

Hyderabad_1 Hyderabad_2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

And we headed to the Pune. I was happy, because we had an extra day in Pune. We arrived to the city in the evening, and the next day after having launch in the hotel, I missed city tour with speakers who were more energized than me :) and found a Starbucks coffee shop and spend few hours reading book (Ikigai – Japanese concept that means “a reason for being.”)  and relaxed a lot. The next day, we checked out from the hotel early in the morning and went to the Oracle office that was bit far from the hotel, and fortunately did a city tour in parallel )) The venue was huge and beautiful and there was a coffee machine that I used a lot to drink a coffee to stay alive. We had a very interactive sessions and after the conference the bus was waiting for us to take us to the Mumbai! It took approximately 4 hours for us to reach to Mumbai, but we enjoyed the travel a lot. In the following link you can see part of our trip in Connor’s video shoot :)

https://www.youtube.com/watch?v=eUkQqj6oDZw

 

Pune1 Pune_2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mumbai meetup was awesome. I got more questions in just a single session than the rest of the tour J and it ended up finishing the 45 minute session in 1.30 hour! But it was not just a presentation, because of those questions the session was like a discussion which I liked a lot!

Mumbai_1 Mumbai_2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

And after the conference, we headed to airport to take the last city – Gurgaon! The next morning I was extremely tired, barely was walking and standing straight. But got a lot of positive energy from the attendees and did 2 sessions successfully. As my flight was on the next day at 4.00 AM, I returned back to the hotel, had some rest and headed to the airport and returned back to my lovely country, Azerbaijan.

 

Gurgaon_1 Gurgaon_2 Gurgaon_3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

So overall, the trip was awesome! It was hard, but it was worth it. I made a new friendships, met online friends that were using my blog posts for years and got a lot of positive feedback, listened stories about how my blog posts saved their lives and etc. :) and it motivated me to write more blog posts in the future. I also attended sessions of other speakers and learned a lot both in terms of presentations and technical skills

I would like to thank to the ODev Yathra Tour organizers, especially Sai Ram for all he had done to make us feel like home, to AIOUG staff, to ACE program – especially Jennifer and Lori for supporting us, to all attendees for taking time and attending our sessions. I love India and the community a lot and looking forward to visit the amazing and incredible India again!

Posted in Uncategorized | 1 Comment »

Download and install Oracle Database 18c – NOW!

Posted by Kamran Agayev A. on July 25th, 2018

Most of you already have seen that Oracle Database 18c has been already released. If you haven’t downloaded and installed it yet, let’s do it!

First of all, check the following address and download the installation of Oracle Database 18c:

http://www.oracle.com/technetwork/database/enterprise-edition/downloads/oracle18c-linux-180000-5022980.html

If you want to download it from the host itself, you can use wget by providing username, password and the installation zip file as follows:

wget –http-user=YOUR_USERNAME –http-password=YOUR_PASSWORD –no-check-certificate –output-document=LINUX.X64_180000_db_home.zip “https://download.oracle.com/otn/linux/oracle18c/180000/LINUX.X64_180000_db_home.zip”

If you want to get more information on this technique, check the following metalink note:

Using WGET to download My Oracle Support Patches (Doc ID 980924.1)

 

Next, unzip the file and run ./runInstaller :

Capture1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Choose the first option and click Next:

Capture2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If you don’t want to choose the components and configure the advanced options, choose “Desktop class” and click Next:

Capture3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Provide the Oracle Base and database file locations, database name and the SYS password and click Next

 

Capture4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Check the summary information and click Install

 

Capture5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Installation (actually relinkin) will proceed and you will be asked to run the root.sh script with the root user. Run it and click Ok to proceed

 

Capture6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The installation will create a database, provide the OEM page and finishes.

Click close, switch to the terminal, login to the database and start getting your hands dirty with Oracle 18c!

Capture7

 

 

 

 

 

 

 

 

 

 

In the next posts, I will share 18c new featuers with practical use cases. Good Luck!

 

Posted in Administration | 2 Comments »

The most horrific Oracle messages you might get in the production database – or – why DBAs get older

Posted by Kamran Agayev A. on May 10th, 2018

If you are a production DBA of mission critical system, then you might have already seen the following critical, I would say mortal messages in your alert.log file.

  • When your database was up and running, you shutdown it and open and it fails to MOUNT the database and abort

image_1

 

 

 

 

  • The database was hanged with millions of online transactions, and aborted. You start the instance, switch to the MOUNT mode, do some maintenance tasks and try to open the database and …. wait …. wait …. wait …..

image_2

 

 

 

 

 

 

 

 

 

  • system01.dbf contains corrupted blocks

 

Image_3

 

 

 

 

 

  • When it takes 15 hours to restore the database, you run the recover database command and get the following errors:

image_4

 

 

 

 

 

 

 

 

 

 

 

  • When you’ve done with restore/recover and open the database with RESETLOGS option and see the following errors:

 

Image_5

 

 

 

 

 

 

 

  • When you have missing datafiles of a tablespace with 10Tb size due to hard disk corruption and don’t have a backup

image_6

 

 

 

 

 

 

 

 

 

  • Incomplete recovery due to missing archived log files and most probably you are going to fail using *.allow_resetlogs_corruption parameter as well

 

Image_7

 

 

 

 

 

 

 

 

 

 

  • When your database hangs, you get a hard disk corruption and lose some datafiles, and it takes an hour and half to perform and instance recovery and you just wait for that time of period for the database to be opened:

 

Image_8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • Aaaand most annoying message during the recovery

 

Image_9

 

 

 

 

 

 

I will keep updating this post with your and my screenshots. Feel free to send me screenshot of cases where you stressed, but eventually succeeded to solve the database issue

Posted in Administration | 2 Comments »

How to pass Oracle Database 12c: RAC and Grid Infrastructure Administration exam – 1Z0-068 and become Oracle Certified Expert

Posted by Kamran Agayev A. on May 3rd, 2018

In this post I will talk about my journey on how to prepare and pass the 12c RAC and Grid Administration exam.

 

About the exam

Check the following link to get more information about the exam from Oracle University page:

https://education.oracle.com/pls/web_prod-plq-dad/db_pages.getpage?page_id=5001&get_params=p_exam_id:1Z0-068

 

The exam consists of 3 parts:

– Oracle 12c ASM Administration
– Oracle 12c Grid Infrastructure Installation and Administration
– Oracle 12c RAC Administration

 

I don’t want to scare you, but the exam is hard enough. The bad thing is – you fail the entire exam if you fail one of the sections. This means that you have to be well prepared for all 3 parts. For me, I was good at ASM and RAC Administration, and was not comfortable with Grid Infrastructure Installation and Administration part which I passed barely.

You may be Oracle high availability expert and fail the exam. You might have an experience but can fail because of useless (or may be uncommon) features and topics that you didn’t practice, or didn’t read or read superficial. Because most of the questions were not checking your practical experience, but theoretical knowledge. I manage high available cluster databases for last 8 years, and it was really hard to answer some of the questions that I haven’t ever faced and I didn’t see the reason to try.
There were a lot of questions like “Choose four option, where blah blah blah ….” And you have to choose 4 options out of 7. You might know 3 correct answers, but because of that 1 wrong option you might fail.

Next, you have to achieve a minimum score for all 3 sections in order to pass the entire exam. You might complete 2 sections with 100% and fail from the one and end up failing the entire exam.

 

How to prepare for the exam?

You have to read the documentation and play with ASM, RAC database and Grid Infrastructure A LOT!

If you want to learn Oracle 12c Grid Infrastructure installation, check the following video tutorial:

http://www.oraclevideotutorials.com/video/installing-oracle-12cr2-grid-infrastructure

 

Check the videos section in oraclevideotutorials.com to find out some clusterware related hands-on practices:

http://www.oraclevideotutorials.com/videos

 

The only available book related with the exam (RAC part mostly) is the following book which is worth reading written by friends of mine Syed Jaffar, Kai Yu and Riyaj Shamsudden:

Expert Oracle RAC 12c
https://www.amazon.com/Expert-Oracle-RAC-Experts-Voice/dp/1430250445/

 

In my OCM preparation book, I have two chapters that can help you during the preparation:

Chapter 7 – Grid Infrastructure and ASM

Chapter 8 – Real Application Clusters.

 

To get free trial pdf copy of the book, go to www.ocmguide.com , or purchase it from the following link:

https://www.amazon.com/Oracle-Certified-Master-Exam-Guide/dp/1536800791/

 

During the exam, I felt regret skipping reading some chapters in the documentation and viewing some of them superficial. I highly recommend to check ASM, RAC and Grid Infrastructure documentation and make sure you went through the entire documentation at least once. Here are the links to the documentations:

 

Real Application Clusters Administration and Deployment Guide

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/racad/toc.htm

 

Clusterware Administration and Deployment Guide

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwadd/toc.htm

 

Automatic Storage Management Administrator’s Guide

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/ostmg/toc.htm

 

Setting deadlines and booking the exam

Most of you (including me) postpone the exam and don’t put deadlines for the preparation and for the exam itself. My advice – set an approximate date for the exam and make a plan for each month, week and day. Then set a date and book the exam! Yes, book it – as you have a chance to rebook if you don’t feel ready unless it’s 24 hours before the exam. Registering for the exam weeks before the exam date will push you to make your preparation completed on time.

 

I booked the exam for Tuesday, rebooked it to Wednesday, then to Thursday, and then to Friday :). On Wednesday I decided to reschedule it to the next Monday and in the evening I was shocked when I saw that I didn’t actually rescheduled it on Friday. It will happen tomorrow! (on Thursday) Just in a few hours! :)

 

I didn’t feel that I’m ready and still having few incomplete sections where I was feeling weak, even was about to cancel the exam and don’t attend, but then decided to push hard and try. And if I lose, I decided to lose like a champ :)

 

So I stayed awake till 3am, took a nap till 6am and made last preparations till 9am. Attended exam at 10am and was completely exhausted, overworked and sleepy.

Fortunately I passed the exam successfully and wish you the same.

O_CertExpert_ODatabase12cORACandOGridInfrastructureAdmin_clr

 

 

 

 

 

 

 

This is my experience with Oracle Database 12c: RAC and Grid Infrastructure Administration exam  (1Z0-068).  Let me know if you plan to take the exam, so I guide you through it in more detail.

Good luck!

Posted in RAC issues | 1 Comment »

Using deprecated ASM parameter might prevent your Cluster to start

Posted by Kamran Agayev A. on October 20th, 2017

Few days ago, I was testing some ASM parameters in my 3 nodes 12.2 Clusterware environment and used ASM_PREFERRED_READ_FAILURE_GROUPS parameter to see how I can force ASM to read specific failure group. Testings were successfull but I didn’t know that this parameter is deprecated in 12.2, and beside that, I didn’t imagine that it might cause me a downtime and prevent Clusterware to start.

Here’s the scenario that you can try in your test environment. First of all, I set this parameter to the failure group and then resetted it back:

SQL> alter system set ASM_PREFERRED_READ_FAILURE_GROUPS=”;

System altered.

SQL> 

 

Then I made some hardware changes to my nodes and rebooted them. After nodes are rebooted, I checked the status of the clusterware, and it was down at all nodes.

 

[oracle@oratest01 ~]$ crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4534: Cannot communicate with Event Manager

 

 

[oracle@oratest01 ~]$ crsctl check cluster -all

**************************************************************

oratest01:

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4534: Cannot communicate with Event Manager

**************************************************************

oratest02:

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

**************************************************************

oratest03:

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

**************************************************************

 

Next, I check if ohasd and crsd background processes are up

[root@oratest01 oracle]# ps -ef|grep init.ohasd|grep -v grep

root      1252     1  0 02:49 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null

[root@oratest01 oracle]#

 

[root@oratest01 oracle]# ps -ef|grep crsd|grep -v grep

[root@oratest01 oracle]#

 

OHAS was up and running, but CRSD not. ASM instance should be up in order to bring the crsd, so I checked if ASM instance is up, but it was also down:

[oracle@oratest01 ~]$ ps -ef | grep smon

oracle    5473  3299  0 02:50 pts/0    00:00:00 grep –color=auto smon

[oracle@oratest01 ~]$

 

 

 

Next, I decided to check log files. Logged in to adrci to find the centralized Clusterware log folder:

 

[oracle@oratest01 ~]$ adrci

ADRCI: Release 12.2.0.1.0 – Production on Fri Oct 20 02:51:59 2017

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

ADR base = “/u01/app/oracle”

adrci> show home

ADR Homes:

diag/rdbms/_mgmtdb/-MGMTDB

diag/rdbms/proddb/proddb1

diag/asm/user_root/host_4288267646_107

diag/asm/user_oracle/host_4288267646_107

diag/asm/+asm/+ASM1

diag/crs/oratest01/crs

diag/clients/user_root/host_4288267646_107

diag/clients/user_oracle/host_4288267646_107

diag/tnslsnr/oratest01/asmnet1lsnr_asm

diag/tnslsnr/oratest01/listener_scan1

diag/tnslsnr/oratest01/listener_scan2

diag/tnslsnr/oratest01/listener_scan3

diag/tnslsnr/oratest01/listener

diag/tnslsnr/oratest01/mgmtlsnr

diag/asmtool/user_root/host_4288267646_107

diag/asmtool/user_oracle/host_4288267646_107

diag/apx/+apx/+APX1

diag/afdboot/user_root/host_4288267646_107

adrci> exit

[oracle@oratest01 ~]$ cd /u01/app/oracle/diag/crs/oratest01/crs

[oracle@oratest01 crs]$cd trace

 

[oracle@oratest01 trace]$ tail -f evmd.trc

2017-10-20 02:54:26.533 :  CRSOCR:2840602368:  OCR context init failure.  Error: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

2017-10-20 02:54:27.552 :  CRSOCR:2840602368:  OCR context init failure.  Error: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

2017-10-20 02:54:28.574 :  CRSOCR:2840602368:  OCR context init failure.  Error: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

 

From evmd.trc file it can bees that OCR was not initialized. Then I check alert.log file:

 

[oracle@oratest01 trace]$ tail -f alert.log

2017-10-20 02:49:49.613 [OCSSD(3825)]CRS-1605: CSSD voting file is online: AFD:DATA1; details in /u01/app/oracle/diag/crs/oratest01/crs/trace/ocssd.trc.

2017-10-20 02:49:49.627 [OCSSD(3825)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1.

2017-10-20 02:49:58.812 [OCSSD(3825)]CRS-1601: CSSD Reconfiguration complete. Active nodes are oratest01 .

2017-10-20 02:50:01.154 [OCTSSD(5351)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 5351

2017-10-20 02:50:01.161 [OCSSD(3825)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.

2017-10-20 02:50:02.099 [OCTSSD(5351)]CRS-2403: The Cluster Time Synchronization Service on host oratest01 is in observer mode.

2017-10-20 02:50:03.233 [OCTSSD(5351)]CRS-2407: The new Cluster Time Synchronization Service reference node is host oratest01.

2017-10-20 02:50:03.235 [OCTSSD(5351)]CRS-2401: The Cluster Time Synchronization Service started on host oratest01.

2017-10-20 02:50:10.454 [ORAAGENT(3362)]CRS-5011: Check of resource “ora.asm” failed: details at “(:CLSN00006:)” in “/u01/app/oracle/diag/crs/oratest01/crs/trace/ohasd_oraagent_oracle.trc”

2017-10-20 02:50:18.692 [ORAROOTAGENT(3198)]CRS-5019: All OCR locations are on ASM disk groups [DATA], and none of these disk groups are mounted. Details are at “(:CLSN00140:)” in “/u01/app/oracle/diag/crs/oratest01/crs/trace/ohasd_orarootagent_root.trc”.

 

CRS didn’t started as the ASM is not up and running. To checking why ASM wasn’t started upon the server book sounded good starting point for the investigation, so logged in and tried to start ASM instance:

 

[oracle@oratest01 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.2.0.1.0 Production on Fri Oct 20 02:55:12 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup

ORA-01078: failure in processing system parameters

SQL> startup

ORA-01078: failure in processing system parameters

SQL> startup

ORA-01078: failure in processing system parameters

SQL>

 

I checked ASM alert.log file, but it didn’t provide enough information why ASM didn’t start:

NOTE: ASM client -MGMTDB:_mgmtdb:clouddb disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ufg_20658_-MGMTDB__mgmtdb.trc
NOTE: cleaned up ASM client -MGMTDB:_mgmtdb:clouddb connection state (reg:2993645709)
2017-10-20T02:47:20.588256-04:00
NOTE: client +APX1:+APX:clouddb deregistered
2017-10-20T02:47:21.201319-04:00
NOTE: detected orphaned client id 0x10004.
2017-10-20T02:48:49.613505-04:00
WARNING: Write Failed, will retry. group:2 disk:0 AU:9067 offset:151552 size:4096
path:AFD:DATA1
incarnation:0xf0a9ba5e synchronous result:’I/O error’
subsys:/opt/oracle/extapi/64/asm/orcl/1/libafd12.so krq:0x7f8fced52240 bufp:0x7f8fc9262000 osderr1:0xfffffff8 osderr2:0xc28
IO elapsed time: 0 usec Time waited on I/O: 0 usec
ERROR: unrecoverable error ORA-15311 raised in ASM I/O path; terminating process 20200

 

The problem seemed to be in the parameter file of ASM, so I decided to start it with default parameters and then investigate. For this, I opened searched for the string “parameters” in the ASM alert.log file to get list of parameters and paramter file location:

[oracle@oratest01 trace]$ more +ASM1_alert.log

Using parameter settings in server-side spfile +DATA/clouddb/ASMPARAMETERFILE/registry.253.949654249

System parameters with non-default values:

  large_pool_size          = 12M

  remote_login_passwordfile= “EXCLUSIVE”

  asm_diskstring           = “/dev/sd*”

  asm_diskstring           = “AFD:*”

  asm_diskgroups           = “NEW”

  asm_diskgroups           = “TESTDG”

  asm_power_limit          = 1

  _asm_max_connected_clients= 4

NOTE: remote asm mode is remote (mode 0x202; from cluster type)

2017-08-11T10:22:24.834431-04:00

Cluster Communication is configured to use IPs from: GPnP

 

Then I created parameter file (/tmp/pfile_asm.ora) and started the instance:

SQL> startup pfile=’/home/oracle/pfile_asm.ora’;

ASM instance started

 

Total System Global Area 1140850688 bytes

Fixed Size                                8629704 bytes

Variable Size                      1107055160 bytes

ASM Cache                            25165824 bytes

ASM diskgroups mounted

SQL> exit

 

Great! ASM is up. Now I can restore my parameter file and try to start ASM with it:

 

[oracle@oratest01 ~]$ sqlplus / as sysasm

SQL> create pfile=’/home/oracle/pfile_orig.ora’ from spfile=’+DATA/clouddb/ASMPARAMETERFILE/registry.253.957837377′;

File created.

SQL> 

 

And here is entry of my original ASM parameter file:

[oracle@oratest01 ~]$ more /home/oracle/pfile_orig.ora

+ASM1.__oracle_base=’/u01/app/oracle’#ORACLE_BASE set from in memory value

+ASM2.__oracle_base=’/u01/app/oracle’#ORACLE_BASE set from in memory value

+ASM3.__oracle_base=’/u01/app/oracle’#ORACLE_BASE set from in memory value

+ASM3._asm_max_connected_clients=5

+ASM2._asm_max_connected_clients=8

+ASM1._asm_max_connected_clients=5

*.asm_diskgroups=’DATA’,’ACFSDG’#Manual Mount

*.asm_diskstring=’/dev/sd*’,’AFD:*’

*.asm_power_limit=1

*.asm_preferred_read_failure_groups=”

*.large_pool_size=12M

*.remote_login_passwordfile=’EXCLUSIVE’

 

Good. Now let’s start ASM with it:

SQL> shut abort

ASM instance shutdown

SQL> startup pfile=’/home/oracle/pfile_orig.ora’;

ORA-32006: ASM_PREFERRED_READ_FAILURE_GROUPS initialization parameter has been deprecated

 

ORA-01078: failure in processing system parameters

SQL>

 

Wohoo. ASM failed to start because of deprecated parameter?! Let’s remove it and start ASM without ASM_PREFERRED_READ_FAILURE_GROUPS parameter:

[oracle@oratest01 ~]$ sqlplus / as sysasm

Connected to an idle instance.

SQL> startup pfile=’/home/oracle/pfile_orig.ora’;

ASM instance started

 

Total System Global Area 1140850688 bytes

Fixed Size                                8629704 bytes

Variable Size                      1107055160 bytes

ASM Cache                            25165824 bytes

ASM diskgroups mounted

SQL> 

 

It is started! Next I create ASM parameter file based on this pfile and start the instance:

SQL> create spfile=’+DATA’ from pfile=’/home/oracle/pfile_orig.ora’;

File created.

 

SQL> shut immediate

ASM diskgroups dismounted

ASM instance shutdown

 

SQL> startup

ASM instance started

Total System Global Area 1140850688 bytes

Fixed Size                                8629704 bytes

Variable Size                      1107055160 bytes

ASM Cache                            25165824 bytes

ASM diskgroups mounted

SQL> 

 

After having ASM up and running I restart the clusterware on all nodes and check the status:

[root@oratest01 ~]$  crsctl stop cluster –all

[root@oratest01 ~]$ crsctl start cluster –all

[oracle@oratest01 ~]$ crsctl check cluster -all

**************************************************************

oratest01:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

CRS-4404: The following nodes did not reply within the allotted time:

oratest02, oratest03

 

The first node is up, but I wasn’t able to get status of clusterware in other nodes and got CRS-4404 error. To solve it, kill gpnpd process on all nodes and run the command again:

 

[oracle@oratest01 ~]$ ps -ef | grep gpn

oracle    3418     1  0 02:49 ?        00:00:15 /u01/app/12.2.0.1/grid/bin/gpnpd.bin

[oracle@oratest01 ~]$ kill -9 3418

[oracle@oratest01 ~]$ ps -ef | grep gpn

oracle   16169     1  3 06:52 ?        00:00:00 /u01/app/12.2.0.1/grid/bin/gpnpd.bin

 

[oracle@oratest01 ~]$ crsctl check cluster -all

**************************************************************

oratest01:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

oratest02:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

oratest03:

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

**************************************************************

[oracle@oratest01 ~]$

 

From this blog post you can learn step by step clusterware startup troubleshooting and not to use depracated ASM parameter

Posted in RAC issues | No Comments »

OCM Exam Tips and Tricks at www.ocmguide.com

Posted by Kamran Agayev A. on October 13th, 2017

Dear friends

Hope most of you already got my book and started preparing for the OCM exam. Every month I get an email from my readers as well as from those who used my book and passed OCM exam successfully!

If you haven’t subscribed to the OCM Newsletter and want to read the previous articles, use the following link:
http://www.ocmguide.com/category/ocm-tips-and-tricks/

 

If you want to get free trial copy of the book in pdf format, use the following address:
http://www.ocmguide.com/

 

If you also want to successfully pass the exam, then use the following address to purchase the book:
https://www.amazon.com/Oracle-Certified-Master-Study-Guide/dp/1536800791/ref=sr_1_1?ie=UTF8&qid=1474879527&sr=8-1&keywords=oracle+exam+guide

 

If you are in my facebook friend list, you have already known that I collect picture of my readers and make them famous in my facebook account :) So if you are a reader of my book, please send me your photo with my book and become a famous! :)

Please do not hesitate to contact me directly regarding any OCM topic you find it complicated. And please post your comments on amazon and here on my blog regarding the book, Your feedback is highly appreciated!

 

ocm-book

Posted in Uncategorized | 1 Comment »