The init.ohasd process does not start clusterware stack after the 12.2.0.1 Grid Infrastructure home is applied April 2020 Release Update. The scenario is not a typical Grid Infrastructure startup issue. It is a human error and happened when DBA, by accident, ran roothas.sh instead of rootcrs.sh while patching GI cluster home. Before knowing it is human error, the diagnose process is done as following.
The processes init.ohasd and ohasd.bin status,
[root@host02]# ps -ef | grep has
root 3313 1 0 11:00 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid 30923 17981 0 12:01 pts/9 00:00:00 grep has
Process ohasd.bin is not started automatically.
Make sure the autostart of crs is not disabled,
[root@host02]# /u01/app/12.2.0/grid/bin/crsctl config crs
CRS-4622: Oracle High Availability Services autostart is enabled.
The command shows that the autostart is enabled. Alternative method to check autostart status is to display the content of file ohasdstr,
[root@host02]# cat /etc/oracle/scls_scr/host02/root/ohasdstr
enable
Try to start crs manually,
[root@host02]# /u01/app/12.2.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@host02]# ps -ef | grep has
root 3313 1 0 11:00 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 23329 1 0 12:07 ? 00:00:04 /u01/app/12.2.0/grid/bin/ohasd.bin reboot
grid 30923 17981 0 12:27 pts/9 00:00:00 grep has
[root@host02]# /u01/app/12.2.0/grid/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[root@host02# ps -ef | grep 'd\.bin'
The ohasd.bin is started, but the clusterware stack is still not started.
Check trace file $ORACLE_BASE/crsdata/<node-name>/output/ohasdOUT.trc
[root@host02# tail /u01/app/grid/crsdata/host02/output/ohasdOUT.trc
Redirected Oracle Clusterware daemon standard output file /u01/app/grid/crsdata/host02/output/ohasdOUT.trc
Initializing Oracle Clusterware OHASD daemon with OS process ID 27459 on host host02 at local time 2020/04/21-22:02:29.433
Oracle Home is: /u01/app/12.2.0/grid
Oracle Base is: /u01/app/grid
NLS_LANG is: AMERICAN_AMERICA.AL32UTF8
Redirected Oracle Clusterware daemon standard output file /u01/app/grid/crsdata/host02/output/ohasdOUT.trc
Initializing Oracle Clusterware OHASD daemon with OS process ID 23329 on host host02 at local time 2020/05/13-12:17:31.952
Oracle Home is: /u01/app/12.2.0/grid
Oracle Base is: /u01/app/grid
NLS_LANG is: AMERICAN_AMERICA.AL32UTF8
OSD_main waitforWatchdogMonitor() : Blocking on FIFO
Check trace file $ORACLE_BASE/diag/crs/<node-name>/crs/trace/ohasd.trc
[root@host02# tail /u01/app/grid/diag/crs/host02/crs/trace/ohasd.trc
Trace file /u01/app/grid/diag/crs/host02/crs/trace/ohasd.trc
Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.
CLSB:3669426272: Argument count (argc) for this daemon is 2
CLSB:3669426272: Argument 0 is: /u01/app/12.2.0/grid/bin/ohasd.bin
CLSB:3669426272: Argument 1 is: reboot
2020-05-13 12:17:32.060 : default:3669426272: Created alert : (:OHAS00117:) : TIMED OUT WAITING FOR OHASD MONITOR
2020-05-13 12:17:32.060 :OHASDMAIN:3669426272: OHASD Daemon waiting for the watchdog monitor script to start
Check alert log file $ORACLE_BASE/diag/crs/<node-name>/crs/trace/alert.log
[root@host02# tail /u01/app/grid/diag/crs/r7-toyo/crs/trace/alert.log
2020-05-12 17:10:56.891 [CLSECHO(29775)]ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
2020-05-12 17:10:56.975 [CLSECHO(29794)]ACFS-9156: Detecting control device '/dev/ofsctl'.
2020-05-12 17:10:59.878 [CLSECHO(30496)]ACFS-9309: ADVM/ACFS installation correctness verified.
2020-05-12 17:11:00.587 [CLSCFG(30626)]CRS-1810: Node-specific configuration for node r7-toyo in Oracle Local Registry was patched to patch level 2554537810.
2020-05-13 12:17:31.979 [OHASD(23329)]CRS-8500: Oracle Clusterware OHASD process is starting with operating system process ID 23329
2020-05-13 12:17:32.053 [OHASD(23329)]CRS-0715: Oracle High Availability Service has timed out waiting for init.ohasd to be started.
It seems that something is wrong with script init.ohasd. Review the script and find following lines,
######### Instantiated Variables #########
ORA_CRS_HOME=/u01/app/12.2.0/grid
export ORA_CRS_HOME
HAS_USER=grid
SCRBASE=/etc/oracle/scls_scr
Here, HAS_USER is supposed to be root for Grid Infrastructure instead of grid which is for standalone Grid Infrastructure (Oracle Restart). Looks like some Oracle Restart configuration script changed the script init.ohasd. By checking the patch applying log, it is found that DBA ran roothas.sh instead of rootcrs.sh during patching. Luckily, it is a two-node cluster and the other node was patched properly.
Copy init.ohasd script from healthy node to replace current script, and reboot the node. Everything comes back,
[root@host02]# ps -ef |grep has
root 23526 1 0 May13 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 33529 1 0 May13 ? 01:04:09 /u01/app/12.2.0/grid/bin/ohasd.bin reboot _ORA_BLOCKING_STACK_LOCALE=AMERICAN_AMERICA.AL32UTF8
oracle 29897 35245 0 22:04 pts/0 00:00:00 grep has
[root@host02# ps -ef | grep 'd\.bin'
root 10858 1 0 May13 ? 01:05:09 /u01/app/12.2.0/grid/bin/ohasd.bin reboot
grid 11145 1 0 May13 ? 00:17:16 /u01/app/12.2.0/grid/bin/mdnsd.bin
grid 11147 1 0 May13 ? 01:07:38 /u01/app/12.2.0/grid/bin/evmd.bin
grid 11182 1 0 May13 ? 00:21:10 /u01/app/12.2.0/grid/bin/gpnpd.bin
grid 11240 1 0 May13 ? 01:19:11 /u01/app/12.2.0/grid/bin/gipcd.bin
grid 11311 1 0 May13 ? 01:20:05 /u01/app/12.2.0/grid/bin/ocssd.bin
root 11508 1 0 May13 ? 01:10:20 /u01/app/12.2.0/grid/bin/octssd.bin reboot
root 11752 1 1 May13 ? 04:27:06 /u01/app/12.2.0/grid/bin/osysmond.bin
root 11773 1 0 May13 ? 01:36:37 /u01/app/12.2.0/grid/bin/crsd.bin reboot
[root@host02]# /u01/app/12.2.0/grid/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.ASMNET2LSNR_ASM.lsnr
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.DATA.dg
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.chad
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.net1.network
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.ons
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
ora.proxy_advm
ONLINE ONLINE host01 STABLE
ONLINE ONLINE host02 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE host01 STABLE
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE host02 STABLE
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE host02 STABLE
ora.MGMTLSNR
1 ONLINE ONLINE host02 169.254.76.22 192.16
8.7.2 192.168.7.10,S
TABLE
ora.asm
1 ONLINE ONLINE host01 Started,STABLE
2 ONLINE ONLINE host02 Started,STABLE
ora.db01.dbsvc01.svc
1 ONLINE ONLINE host02 STABLE
ora.db01.db
1 ONLINE ONLINE host02 Open,HOME=/u01/app/ora
cle/product/12.2.0/dbh
ome_1,STABLE
ora.cvu
1 ONLINE ONLINE host02 STABLE
ora.mgmtdb
1 ONLINE ONLINE host02 Open,STABLE
ora.qosmserver
1 ONLINE ONLINE host02 STABLE
ora.host01.vip
1 ONLINE ONLINE host01 STABLE
ora.host02.vip
1 ONLINE ONLINE host02 STABLE
ora.scan1.vip
1 ONLINE ONLINE host01 STABLE
ora.scan2.vip
1 ONLINE ONLINE host02 STABLE
ora.scan3.vip
1 ONLINE ONLINE host02 STABLE
--------------------------------------------------------------------------------
No comments:
Post a Comment