RMAN jobs can successfully backup database to tape with NetBackup, but leave server processes (RMAN channels) and NetBackup nborautil processes running. The processes look like defunct processes.
The issue is observed in Oracle Database 19c RMAN backup job with NetBackup 8.1.2 on AIX 7.2, it may also happen on other platform with NetBackup 8.1.2 or 8.2. When it happens, nborautil processes can be seen running like following
$ ps -ef | grep nborautil oracle 17236328 6489154 31 12:14:13 - 129:18 -bprdtype 2 -use_stdin -client host01 -bprd -noxmloutput -ignorenamespace -jsonoutput 26 -eoichar /usr/openv/netbackup/bin/nborautil oracle 32506230 5440300 26 12:47:03 - 113:20 -bprdtype 2 -use_stdin -client host01 -bprd -noxmloutput -ignorenamespace -jsonoutput 26 -eoichar /usr/openv/netbackup/bin/nborautil $ ps -ef | grep 6489154 oracle 17236328 6489154 30 12:14:13 - 129:44 -bprdtype 2 -use_stdin -client host01 -bprd -noxmloutput -ignorenamespace -jsonoutput 26 -eoichar /usr/openv/netbackup/bin/nborautil oracle 6489154 1 0 Dec 27 - 11:47 oracleORCL (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) $ $ ps -ef | grep 5440300 oracle 32506230 5440300 24 12:47:03 - 113:44 -bprdtype 2 -use_stdin -client host01 -bprd -noxmloutput -ignorenamespace -jsonoutput 26 -eoichar /usr/openv/netbackup/bin/nborautil oracle 5440300 1 0 Dec 28 - 10:19 oracleORCL (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
Two NetBackup nborautil processes seem defunct, their parent process id are 6489154 and 5440300 respectively. Both parent processes are server processes of Oracle Database Instance ORCL. Check the session status of the server processes,
sys@ORCL> select s.sid,s.serial#,s.username,s.machine,s.program,s.event 2 from v$session s, v$process p 3 where s.paddr=p.addr and p.spid in (5440300,6489154); SID SERIAL# USERNAME MACHINE PROGRAM EVENT ----- ---------- --------- -------- ------------------------- ------------------ 300 44503 SYS host01 rman@host01 (TNS V1-V3) Backup: MML shutdown 178 17334 SYS host01 rman@host01 (TNS V1-V3) Backup: MML shutdown
The server processes are waiting for event "Backup: MML shutdown", it means waiting for NetBackup nborautil process to complete, and nborautil never exits, though Veritas Support claims it only takes longer time than expected not hang. As I saw, every RMAN job will leave new nborautil processes running, and eventually the defunct processes will use up all CPU resource and hang the system.
Veritas Support claims that this is bug of NetBackup on version 8.1.2/8.2. This issue will occur on all Oracle backups using RMAN script if the database is comprised of many datafiles.
The Oracle backups will have an unusually long delay between when the last child job completes and when the parent job completes. They delay may extend to several hours, even days.
There are no performance or slow behaviour observed prior to the completion of the data transfer. The delay is only during the meta data cataloging after the data transfer jobs have completed.
Smaller databases and those with fewer datafiles may not experience this delay.
So far, Veritas does not release any fix for it. If possible, upgrade NetBackup to higher version (e.x. 9). As a workaround, reducing backup datafiles in single job may help.
No comments:
Post a Comment