Oracle 19c hot backup mode? (一)

August 2, 2020, 8:45 am

≪ Previous: Downgrade Grid Infrastructure 12.1.0.2 to 11.2.0.4降级后crs无法启动 No voting files found

没有维护过oracle 8\9那个版本时，可能不会太接触这个热备份模式，这个技术已经被RMAN所替代很多年，但是就是这个东西，让我们在最近一次19c 数据库故障中走了弯路，数据库的内部某个机制触发了begin backup，因为异常crash后又归档缺失，还尝试从备份做了恢复，最终还是使用bbed修改数据文件头异常恢复，目前为什么会处于备份模式还没有查到原因，不过提醒一下记的检查数据库是否有存在hot backup mode的文件，并分享什么是hot backup mode.

下面模拟一下 19c 多租户，注意如果做了重建控制文件，PDB名就显示为未知了，无法切换PDB，在第一次Open CDB时pdb名会自动找回。

[oracle@oel7db1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Aug 2 04:44:15 2020
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle.  All rights reserved.

SQL> alter database archivelog;
Database altered.

SQL> alter database open;
Database altered.

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED

SQL> alter pluggable database pdb1 open;
Pluggable database altered.

SQL> select * from v$backup;

     FILE# STATUS                CHANGE# TIME          CON_ID
---------- ------------------ ---------- --------- ----------
         1 NOT ACTIVE                  0                    1
         3 NOT ACTIVE                  0                    1
         4 NOT ACTIVE                  0                    1
         5 NOT ACTIVE                  0                    2
         6 NOT ACTIVE                  0                    2
         7 NOT ACTIVE                  0                    1
         8 NOT ACTIVE                  0                    2
         9 NOT ACTIVE                  0                    3
        10 NOT ACTIVE                  0                    3
        11 NOT ACTIVE                  0                    3
        12 NOT ACTIVE                  0                    3
       182 NOT ACTIVE                  0                    3
	   
SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from  v$containers a,v$datafile b where a.con_id=b.con_id order by

    CON_ID NAME            FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- ---------- ---------- ---------- ------------------ ------------------- -------
         2 PDB$SEED            8          9            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED            6          4            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED            5          1            2144549 2020-03-20 06:14:40 SYSTEM
         3 PDB1              182        181            4102092 2020-06-16 05:25:52 ONLINE
         1 CDB$ROOT            4          4            5061692 2020-08-02 04:45:04 ONLINE
         1 CDB$ROOT            3          3            5061692 2020-08-02 04:45:04 ONLINE
         1 CDB$ROOT            1          1            5061692 2020-08-02 04:45:04 SYSTEM
         1 CDB$ROOT            7          7            5061692 2020-08-02 04:45:04 ONLINE
         3 PDB1               10          4            5062895 2020-08-02 04:46:20 ONLINE
         3 PDB1               11          9            5062895 2020-08-02 04:46:20 ONLINE
         3 PDB1               12         12            5062895 2020-08-02 04:46:20 ONLINE
         3 PDB1                9          1            5062895 2020-08-02 04:46:20 SYSTEM

12 rows selected.

SQL> alter database begin backup;
Database altered.

SQL> @log
Show redo log layout from V$LOG, V$STANDBY_LOG and V$LOGFILE...

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS                FIRST_CHANGE# FIRST_TIME                    NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------------ ------------------- ---------------------- ----------
         1          1         31  209715200        512          1 YES INACTIVE                    4920596 2020-07-28 05:04:50                5027660 2020-07-28
         2          1         32  209715200        512          1 NO  CURRENT                     5027660 2020-07-28 23:10:19   18446744073709551615
         3          1         30  209715200        512          1 YES INACTIVE                    4806829 2020-07-21 23:13:41                4920596 2020-07-28

SQL> alter system switch logfile;
System alered.
SQL> alter system switch logfile;
System altered.

SQL> alter system switch logfile;
System altered.

SQL> @log
Show redo log layout from V$LOG, V$STANDBY_LOG and V$LOGFILE...

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS                FIRST_CHANGE# FIRST_TIME                    NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------------ ------------------- ---------------------- ----------
         1          1         34  209715200        512          1 YES INACTIVE                    5063932 2020-08-02 04:51:47                5063936 2020-08-02
         2          1         35  209715200        512          1 NO  CURRENT                     5063936 2020-08-02 04:51:49   18446744073709551615
         3          1         33  209715200        512          1 YES INACTIVE                    5063929 2020-08-02 04:51:45                5063932 2020-08-02

SQL>  select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from  v$containers a,v$datafile b where a.con_id=b.con_id order b

    CON_ID NAME            FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- ---------- ---------- ---------- ------------------ ------------------- -------
         2 PDB$SEED            8          9            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED            6          4            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED            5          1            2144549 2020-03-20 06:14:40 SYSTEM
         3 PDB1              182        181            4102092 2020-06-16 05:25:52 ONLINE
         3 PDB1               10          4            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1               11          9            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1               12         12            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                9          1            5063905 2020-08-02 04:51:25 SYSTEM
         1 CDB$ROOT            4          4            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT            3          3            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT            1          1            5063905 2020-08-02 04:51:25 SYSTEM
         1 CDB$ROOT            7          7            5063905 2020-08-02 04:51:25 ONLINE

12 rows selected.

SQL> alter system checkpoint;
System altered.

SQL>  select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from  v$containers a,v$datafile b where a.con_id=b.con_id order b

    CON_ID NAME            FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- ---------- ---------- ---------- ------------------ ------------------- -------
         2 PDB$SEED            8          9            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED            6          4            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED            5          1            2144549 2020-03-20 06:14:40 SYSTEM
         3 PDB1              182        181            4102092 2020-06-16 05:25:52 ONLINE
         3 PDB1               10          4            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1               11          9            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1               12         12            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                9          1            5063905 2020-08-02 04:51:25 SYSTEM
         1 CDB$ROOT            4          4            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT            3          3            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT            1          1            5063905 2020-08-02 04:51:25 SYSTEM
         1 CDB$ROOT            7          7            5063905 2020-08-02 04:51:25 ONLINE

SQL> @dbinfo

      DBID NAME       CREATED             LOG_MODE     CHECKPOINT_CHANGE# OPEN_MODE            FORCE_LOGGING
---------- ---------- ------------------- ------------ ------------------ -------------------- ---------------------------------------
3414393273 ANBOB19C   2020-03-20 05:43:21 ARCHIVELOG              5063984 READ WRITE           NO


SQL> shut immediate
ORA-01149: cannot shutdown - file 1 has online backup set
ORA-01110: data file 1: '/u01/app/oracle/oradata/ANBOB19C/system01.dbf'
SQL> shut abort
ORACLE instance shut down.

[oracle@oel7db1 dbs]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Aug 2 05:06:44 2020
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1073738888 bytes
Fixed Size                  9143432 bytes
Variable Size             792723456 bytes
Database Buffers          268435456 bytes
Redo Buffers                3436544 bytes
Database mounted.
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 1: '/u01/app/oracle/oradata/ANBOB19C/system01.dbf'

SQL>  @log
Show redo log layout from V$LOG, V$STANDBY_LOG and V$LOGFILE...

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS                FIRST_CHANGE# FIRST_TIME                    NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------------ ------------------- ---------------------- ----------
         1          1         37  209715200        512          1 NO  CURRENT                     5063951 2020-08-02 04:52:15   18446744073709551615
         2          1         35  209715200        512          1 YES INACTIVE                    5063936 2020-08-02 04:51:49                5063947 2020-08-02
         3          1         36  209715200        512          1 YES INACTIVE                    5063947 2020-08-02 04:52:13                5063951 2020-08-02

SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from
  v$containers a,v$datafile b where a.con_id=b.con_id order by

    CON_ID NAME                      FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- -------------------- ---------- ---------- ------------------ ------------------- -------
         2 PDB$SEED                      8          9            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED                      5          1            2144549 2020-03-20 06:14:40 SYSTEM
         2 PDB$SEED                      6          4            2144549 2020-03-20 06:14:40 ONLINE
         3 PDB1                        182        181            4102092 2020-06-16 05:25:52 ONLINE
         3 PDB1                         10          4            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         12         12            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         11          9            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      1          1            5063905 2020-08-02 04:51:25 SYSTEM
         3 PDB1                          9          1            5063905 2020-08-02 04:51:25 SYSTEM
         1 CDB$ROOT                      4          4            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      3          3            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      7          7            5063905 2020-08-02 04:51:25 ONLINE

12 rows selected.

SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status 
from  v$containers a,v$datafile_header b where a.con_id=b.con_id o

    CON_ID NAME                      FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- -------------------- ---------- ---------- ------------------ ------------------- -------
         2 PDB$SEED                      8          9            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED                      5          1            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED                      6          4            2144549 2020-03-20 06:14:40 ONLINE
         3 PDB1                        182        181            4102092 2020-06-16 05:25:52 ONLINE
         3 PDB1                         10          4            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         12         12            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         11          9            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      1          1            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                          9          1            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      4          4            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      3          3            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      7          7            5063905 2020-08-02 04:51:25 ONLINE

12 rows selected.

SQL> alter database datafile 1 end backup;
Database altered.

SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from  v$containers a,v$datafile_header b where a.con_id=b.con_id o

    CON_ID NAME                      FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- -------------------- ---------- ---------- ------------------ ------------------- -------
         2 PDB$SEED                      8          9            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED                      5          1            2144549 2020-03-20 06:14:40 ONLINE
         2 PDB$SEED                      6          4            2144549 2020-03-20 06:14:40 ONLINE
         3 PDB1                        182        181            4102092 2020-06-16 05:25:52 ONLINE
         3 PDB1                         10          4            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         12         12            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         11          9            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                          9          1            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      4          4            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      3          3            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      7          7            5063905 2020-08-02 04:51:25 ONLINE
         1 CDB$ROOT                      1          1            5063984 2020-08-02 04:53:31 ONLINE

12 rows selected.

SQL> alter database datafile 9 end backup;
alter database datafile 9 end backup
*
ERROR at line 1:
ORA-01516: nonexistent log file, data file, or temporary file "9" in the current container

SQL> alter pluggable database datafile 9 end backup;
alter pluggable database datafile 9 end backup
*
ERROR at line 1:
ORA-01109: database not open


SQL> @cc pdb1;
ALTER SESSION SET container = pdb1;
Session altered.
--如果重建了控制文件，这一步就没法做了，因为pdb名字未知.

SQL> alter pluggable database datafile 9 end backup;

Pluggable database altered.

SQL> alter database datafile 11 end backup;

Database altered.

SQL> select a.con_id,a.name,b.file#,b.rfile#,b.checkpoint_change#,b.checkpoint_time,b.status from  v$containers a,v$datafile_header b where a.con_id=b.con_id o

    CON_ID NAME                      FILE#     RFILE# CHECKPOINT_CHANGE# CHECKPOINT_TIME     STATUS
---------- -------------------- ---------- ---------- ------------------ ------------------- -------
         3 PDB1                        182        181            4102092 2020-06-16 05:25:52 ONLINE
         3 PDB1                         10          4            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         12         12            5063905 2020-08-02 04:51:25 ONLINE
         3 PDB1                         11          9            5063984 2020-08-02 04:53:31 ONLINE
         3 PDB1                          9          1            5063984 2020-08-02 04:53:31 ONLINE

使用bbed对比begin backup还没有结束的和已经end backup的两个文件头

SQL> select file#,name from v$datafile;

     FILE# NAME
---------- -------------------------------------------------------
         9 /u01/app/oracle/oradata/ANBOB19C/pdb1/system01.dbf
        10 /u01/app/oracle/oradata/ANBOB19C/pdb1/sysaux01.dbf
        11 /u01/app/oracle/oradata/ANBOB19C/pdb1/undotbs01.dbf
        12 /u01/app/oracle/oradata/ANBOB19C/pdb1/users01.dbf
       182 /u01/app/oracle/oradata/ANBOB19C/pdb1/tbs101.dbf

SQL> select to_char(5063984,'xxxxxxxxxxxx') from dual;

TO_CHAR(50639
-------------
       4d4530

SQL> select to_char(5063905,'xxxxxxxxxxxx') from dual;

TO_CHAR(50639
-------------
       4d44e1
-- file 10#
[oracle@oel7db1 ~]$ bbed blocksize=8192 mode=edit filename=/u01/app/oracle/oradata/ANBOB19C/pdb1/sysaux01.dbf
Password:

BBED: Release 2.0.0.0.0 - Limited Production on Sun Aug 2 05:22:14 2020

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************


BBED> p kcvfh
struct kcvfh, 1272 bytes                    @0
   struct kcvfhbfh, 20 bytes                @0
      ub1 type_kcbh                         @0        0x0b
      ub1 frmt_kcbh                         @1        0xa2
      ub2 wrp2_kcbh                         @2        0x0000
      ub4 rdba_kcbh                         @4        0x01000001
      ub4 bas_kcbh                          @8        0x00000000
      ub2 wrp_kcbh                          @12       0x0000
      ub1 seq_kcbh                          @14       0x01
      ub1 flg_kcbh                          @15       0x04 (KCBHFCKV)
      ub2 chkval_kcbh                       @16       0xb992
      ub2 spare3_kcbh                       @18       0x0000
   struct kcvfhhdr, 76 bytes                @20
      ub4 kccfhswv                          @20       0x00000000
      ub4 kccfhcvn                          @24       0x13000000
      ub4 kccfhdbi                          @28       0xcb8381b9
      text kccfhdbn[0]                      @32      A
      text kccfhdbn[1]                      @33      N
      text kccfhdbn[2]                      @34      B
      text kccfhdbn[3]                      @35      O
      text kccfhdbn[4]                      @36      B
      text kccfhdbn[5]                      @37      1
      text kccfhdbn[6]                      @38      9
      text kccfhdbn[7]                      @39      C
      ub4 kccfhcsq                          @40       0x00003330
      ub4 kccfhfsz                          @44       0x0000c800
      s_blkz kccfhbsz                       @48       0x00
      ub2 kccfhfno                          @52       0x000a
      ub2 kccfhtyp                          @54       0x0003
      ub4 kccfhacid                         @56       0x00000000
      ub4 kccfhcks                          @60       0x00000000
      text kccfhtag[0]                      @64
      text kccfhtag[1]                      @65
      text kccfhtag[2]                      @66
      text kccfhtag[3]                      @67
      text kccfhtag[4]                      @68
      text kccfhtag[5]                      @69
      text kccfhtag[6]                      @70
      text kccfhtag[7]                      @71
      text kccfhtag[8]                      @72
      text kccfhtag[9]                      @73
      text kccfhtag[10]                     @74
      text kccfhtag[11]                     @75
      text kccfhtag[12]                     @76
      text kccfhtag[13]                     @77
      text kccfhtag[14]                     @78
      text kccfhtag[15]                     @79
      text kccfhtag[16]                     @80
      text kccfhtag[17]                     @81
      text kccfhtag[18]                     @82
      text kccfhtag[19]                     @83
      text kccfhtag[20]                     @84
      text kccfhtag[21]                     @85
      text kccfhtag[22]                     @86
      text kccfhtag[23]                     @87
      text kccfhtag[24]                     @88
      text kccfhtag[25]                     @89
      text kccfhtag[26]                     @90
      text kccfhtag[27]                     @91
      text kccfhtag[28]                     @92
      text kccfhtag[29]                     @93
      text kccfhtag[30]                     @94
      text kccfhtag[31]                     @95
   ub4 kcvfhrdb                             @96       0x00400208
   struct kcvfhcrs, 8 bytes                 @100
      ub4 kscnbas                           @100      0x0020bdd8
      ub2 kscnwrp                           @104      0x8000
      ub2 kscnwrp2                          @106      0x0000
   ub4 kcvfhcrt                             @108      0x3db8e18f
   ub4 kcvfhrlc                             @112      0x3db8d9fe
   struct kcvfhrls, 8 bytes                 @116
      ub4 kscnbas                           @116      0x001d4fd1
      ub2 kscnwrp                           @120      0x8000
      ub2 kscnwrp2                          @122      0x0000
   ub4 kcvfhbti                             @124      0x3e6d6b4d
   struct kcvfhbsc, 8 bytes                 @128
      ub4 kscnbas                           @128      0x004d44e1
      ub2 kscnwrp                           @132      0x8000
      ub2 kscnwrp2                          @134      0x0000
   ub2 kcvfhbth                             @136      0x0001
   ub2 kcvfhsta                             @138      0x0001 (KCVFHHBP)
   struct kcvfhckp, 36 bytes                @484
      struct kcvcpscn, 8 bytes              @484
         ub4 kscnbas                        @484      0x004d44e1
         ub2 kscnwrp                        @488      0x8000
         ub2 kscnwrp2                       @490      0x0000
      ub4 kcvcptim                          @492      0x3e6d6b4d
      ub2 kcvcpthr                          @496      0x0001
      union u, 12 bytes                     @500
         struct kcvcprba, 12 bytes          @500
            ub4 kcrbaseq                    @500      0x00000020
            ub4 kcrbabno                    @504      0x000216d6
            ub2 kcrbabof                    @508      0x0010
      ub1 kcvcpetb[0]                       @512      0x02
      ub1 kcvcpetb[1]                       @513      0x00
      ub1 kcvcpetb[2]                       @514      0x00
      ub1 kcvcpetb[3]                       @515      0x00
      ub1 kcvcpetb[4]                       @516      0x00
      ub1 kcvcpetb[5]                       @517      0x00
      ub1 kcvcpetb[6]                       @518      0x00
      ub1 kcvcpetb[7]                       @519      0x00
   ub4 kcvfhcpc                             @140      0x0000008f
   ub4 kcvfhrts                             @144      0x3e5e9ba4
   ub4 kcvfhccc                             @148      0x0000008e
   struct kcvfhbcp, 36 bytes                @152
      struct kcvcpscn, 8 bytes              @152
         ub4 kscnbas                        @152      0x004d4530
         ub2 kscnwrp                        @156      0x8000
         ub2 kscnwrp2                       @158      0x0000
      ub4 kcvcptim                          @160      0x3e6d6bcb
      ub2 kcvcpthr                          @164      0x0001
      union u, 12 bytes                     @168
         struct kcvcprba, 12 bytes          @168
            ub4 kcrbaseq                    @168      0x00000025
            ub4 kcrbabno                    @172      0x00000003
            ub2 kcrbabof                    @176      0x0010
      ub1 kcvcpetb[0]                       @180      0x02
      ub1 kcvcpetb[1]                       @181      0x00
      ub1 kcvcpetb[2]                       @182      0x00
      ub1 kcvcpetb[3]                       @183      0x00
      ub1 kcvcpetb[4]                       @184      0x00
      ub1 kcvcpetb[5]                       @185      0x00
      ub1 kcvcpetb[6]                       @186      0x00
      ub1 kcvcpetb[7]                       @187      0x00
   ub4 kcvfhbhz                             @312      0x0000c800
   struct kcvfhxcd, 16 bytes                @316
      ub4 space_kcvmxcd[0]                  @316      0x00000000
      ub4 space_kcvmxcd[1]                  @320      0x00000000
      ub4 space_kcvmxcd[2]                  @324      0x00000000
      ub4 space_kcvmxcd[3]                  @328      0x00000000
   sword kcvfhtsn                           @332      1
   ub2 kcvfhtln                             @336      0x0006
   text kcvfhtnm[0]                         @338     S
   text kcvfhtnm[1]                         @339     Y
   text kcvfhtnm[2]                         @340     S
   text kcvfhtnm[3]                         @341     A
   text kcvfhtnm[4]                         @342     U
   text kcvfhtnm[5]                         @343     X
   text kcvfhtnm[6]                         @344
   text kcvfhtnm[7]                         @345
   text kcvfhtnm[8]                         @346
   text kcvfhtnm[9]                         @347
   text kcvfhtnm[10]                        @348
   text kcvfhtnm[11]                        @349
   text kcvfhtnm[12]                        @350
   text kcvfhtnm[13]                        @351
   text kcvfhtnm[14]                        @352
   text kcvfhtnm[15]                        @353
   text kcvfhtnm[16]                        @354
   text kcvfhtnm[17]                        @355
   text kcvfhtnm[18]                        @356
   text kcvfhtnm[19]                        @357
   text kcvfhtnm[20]                        @358
   text kcvfhtnm[21]                        @359
   text kcvfhtnm[22]                        @360
   text kcvfhtnm[23]                        @361
   text kcvfhtnm[24]                        @362
   text kcvfhtnm[25]                        @363
   text kcvfhtnm[26]                        @364
   text kcvfhtnm[27]                        @365
   text kcvfhtnm[28]                        @366
   text kcvfhtnm[29]                        @367
   ub4 kcvfhrfn                             @368      0x00000004
   struct kcvfhrfs, 8 bytes                 @372
      ub4 kscnbas                           @372      0x00000000
      ub2 kscnwrp                           @376      0x0000
      ub2 kscnwrp2                          @378      0x0000
   ub4 kcvfhrft                             @380      0x00000000
   struct kcvfhafs, 8 bytes                 @384
      ub4 kscnbas                           @384      0x00000000
      ub2 kscnwrp                           @388      0x0000
      ub2 kscnwrp2                          @390      0x0000
   ub4 kcvfhbbc                             @392      0x00000000
   ub4 kcvfhncb                             @396      0x00000000
   ub4 kcvfhmcb                             @400      0x00000000
   ub4 kcvfhlcb                             @404      0x00000000
   ub4 kcvfhbcs                             @408      0x00000000
   ub2 kcvfhofb                             @412      0x0000
   ub2 kcvfhnfb                             @414      0x0000
   ub4 kcvfhprc                             @416      0x3bf3129f
   struct kcvfhprs, 8 bytes                 @420
      ub4 kscnbas                           @420      0x00000001
      ub2 kscnwrp                           @424      0x0000
      ub2 kscnwrp2                          @426      0x0000
   struct kcvfhprfs, 8 bytes                @428
      ub4 kscnbas                           @428      0x00000000
      ub2 kscnwrp                           @432      0x0000
      ub2 kscnwrp2                          @434      0x0000
   ub4 kcvfhtrt                             @444      0x00000000

BBED>

# file 9#
BBED> set filename '/u01/app/oracle/oradata/ANBOB19C/pdb1/system01.dbf';
        FILENAME        /u01/app/oracle/oradata/ANBOB19C/pdb1/system01.dbf

-- like above

放到UE中对比差异，当然出了块地址还块号相关的，我们主要关注checkpoint相关.

如果不做begin backup查询x$kcvfh.FHBCP_SCN默认为0否则不为0，当end backup时是 x$kcvfh.FHSCN=x$kcvfh.FHBCP_SCN。当然常规推scn 是无法解决的，使用bbed 修改kcvfhckp的SCN 为kcvfhbcp的SCN。当然数据文件多，可以使用bbed 复制该结构到不同数据文件如:

assign file 1 block 1 kcvfhckp = file 140 block 1 kcvfhckp
assign file 3 block 1 kcvfhckp = file 140 block 1 kcvfhckp

如果数据文件在ASM中可能又要麻烦一些，可以使得dbms_diskgroup提供的内部方法dbms_diskgroup.patchfile去传递ASM和文件系统的文件头块。

Checkpoint Count
Allow detection of a restored data file or control File
Incremented at every checkpoint
Checkpoint Structure
Records the last checkpoint information
Frozen when file is in hot backup mode
Backup checkpoint SCN
Updated during hot backups
Used in conjunction with the alter database end backup command
If the checkpoint count and the backup checkpoint structure match the information found in the control file , the end backup command succeeds and clears the hot backup fuzzy bit.

下一篇是hot backup MODE的知识。

↧

Oracle 19c hot backup mode? (二)

August 2, 2020, 9:09 am

≫ Next: Troubleshooting ORA-04031: unable to allocate 13840 bytes of shared memory “ges resource dynamic” in 12C+

≪ Previous: Oracle 19c hot backup mode? (一)

Oracle 19c hot backup mode? (一)

Hot Backups
To take hot backups, the database must be in ARCHIVELOG mode. If you are using RMAN, the tablespace does not need to be put into backup mode.

Unlike user-managed tools, RMAN does not require extra logging or backup mode because it knows the format of data blocks. RMAN is guaranteed not to back up fractured blocks. During an RMAN backup, a database server session reads each data block and checks whether it is fractured by comparing the block header and footer. If a block is fractured, then the session rereads the block. If the same fracture is found, then the block is considered permanently corrupt. Also, RMAN does not need to freeze the data file header checkpoint because it knows the order in which the blocks are read, which enables it to capture a known good checkpoint for the file.

Mechanisms of hot backup:

1. datafile checkpoint structure is frozen
2. backup is transparent to usrs.
3. Block splits during hot backups:
4. data block is written by DBW0 while it is read by the O/S copy utility.
5. Can cause fractured or split blocks.
6. Redo for whole blocks is logged to the redo stream.
7. More redo is generated during hot backups.
8. Tablespace should be taken out of backup mode as soon as the file copy is completed.

Fractured blocks may occur when there is an update to a database block while the block is being backed up at the OS level. Because Oracle blocks normally consist of more than one OS block, it is possible that part of the block is backed up, the block is changed, and then the rest of the block is backed up. The two halves of the block are not consistent with each other.

To prevent this, Oracle copies the entire block to the redo logs. During recovery, the block is copied from the redo log and any further changes logged in the redo are applied to the recovered block.

if the database crashes during a hot backup , file in hot backup mode will require media recovery on startup:
take datafiles out of hot backup mode.V$BACKUP shows data files that are in hot backup mode.

if any redo logs are missing, recovery is impossible.

to take a tablespace out of backup on startup, mount the database and issue the following commond:
alter database datafile ‘xxxx’ end backup;
This updates the data file header and control file..

Alternatively, you can issue the RECOVER DATABASE command but this will involve checking all the redo logs for missing transactions since the hot backup started.

When taking hot backups, securing all the redo necessary to restore the datafiles to a consistent point in time is critical. To do this, follow these steps:
1. Alter system switch logfile. Make note of the current log.
2. Put tablespaces in backup mode one by one and copy at OS level.
3. After all files have been backed up, make a backup copy of the control file.
4. Issue an ARCHIVE LOG CURRENT so that all changes in the online logs are archived. Make note of the last log to be archived.

The backup set now consists of all datafiles, the control file backup, and all the archive logs from the current log in step 1 to the last archived log in step 4. These are all archive logs that could possibly contain information needed to recover the backup set.

Restore From a Hot Backup
When taking hot backups, the file header is frozen while the file is being copied. This means that each data file can have a different checkpoint in your backup set. To make the backup set consistent, all files need to be recovered until their SCNs match again and the fuzzy bits have been reset.

Backup scn, time: Updated when executing BEGIN BACKUP on the tablespace. RMAN does not update this field.
(The next few lines in the dump are incorrectly indented, erroneously suggesting they belong to the backup SCN.)

select fhfno,fhscn,fhbcp_scn from x$kcvfh;

Checkpoint Structure
Records the last checkpoint information
Frozen when file is in hot backup mode
Backup checkpoint SCN
Updated during hot backups
Used in conjunction with the alter database end backup command

Creation SCN is when the file was created. No redo exists prior to this SCN for this file. Backup taken at SCN is updated at BEGIN BACKUP and indicates when a hot backup started.

Resetlogs Data lists the timestamp of the most recent resetlogs and the SCN of the file after resetlogs. This information is compared with controlfile ( database entry ) information when the file is brought online.
Status Definition ( from kcv3.h )
#define KCVFHHBP 0x01 /* hotbackup-in-progress on file (fuzzy file) */
#define KCVFHOFZ 0x04 /* Online FuZzy because it was online and db open */
#define KCVFHMFZ 0x10 /* Media recovery FuZzy – file in media recovery */
#define KCVFHAFZ 0x40 /* Absolutely FuZzy – fuzzyness from file scan */

A hot or online backup is taken when the database is open. ARCHIVELOG mode is a prerequisite. For manual hot backups, the involved tablespaces must be placed in backup mode. With RMAN online backups, a different mechanism ensures block consistency. Thus you do not place tablespaces into any special mode for backup. There are usually two steps to a perform a media recovery: The optional first step is restore, which is a simple copying of files from the backup system to the database locations.
(RMAN may need to assemble the file from several incremental RMAN backups.) The second step is the recover step, when the server applies the archive and online redo logs. Normally when people refer to recovery, they are referring to the restore phase.

Hot backup in progress ( Bit 0X01 ) : Applies only to hot backups made using utilities outside the kernel. This flag indicates that the file is in fuzzy state from its checkpoint until its end hot backup marker. It is set at begin hot backup. Recovery clears this flag when it crosses the end of hot backup marker.
Online fuzzy ( Bit 0X04 ) : Indicates that the file is in fuzzy state from its checkpoint until the crash recovery marker. It is set when the data files is first open or online or made read-write. It is cleared when the datafile is close , offline ( normal ) or made read-only.

The database cannot be shut down cleanly if a tablespace is in hot backup mode:
ORA-01149: cannot shutdown – file 1 has online backup set
ORA-01110: data file 1 : ‘/u02/oracle/or8i/system01.dbf’

open

SQL> startup
ORACLE instance started.

Total System Global Area 1073738888 bytes
Fixed Size                  9143432 bytes
Variable Size             792723456 bytes
Database Buffers          268435456 bytes
Redo Buffers                3436544 bytes
Database mounted.
ORA-10873: file 1 needs to be either taken out of backup mode or media recovered
ORA-01110: data file 1: '/u01/app/oracle/oradata/ANBOB19C/system01.dbf'

Solution:

SQL> alter database end backup;
Database altered.

SQL> alter database open;
Database altered.

kcvfhsta

#define KCVFH_FUZZY (KCVFHHBP|KCVFHOFZ|KCVFHMFZ|KCVFHAFZ|KCVFHPCP|KCVFHSBY)
#define KCVFH_OOFUZZY (KCVFHHBP|KCVFHMFZ|KCVFHAFZ|KCVFHPCP|KCVFHSBY)
#define KCVFH_NMFUZZY (KCVFH_FUZZY^KCVFHMFZ^KCVFHSBY) /* non-MR fuzzy */
#define KCVFHBCP 0x100 /* Bad Checkpoint – no enabled thread bitvec */
#define KCVFHFMH 0x200 /* Freshly Munged Header. resetlogs not finished */
#define KCVFHXCH 0x400 /* eXternally CacHed by operating system */
#define KCVFHZBA 0x800 /* Zeroed Blocks Allowed */
#define KCVFHPCP 0x1000 /* Proxy Copy in Progress */
#define KCVFHRBS 0x2000 /* does kcvfhrdb point to bootstrap$ ? */
#define KCVFHSBY 0x4000 /* media rcv fuzzy due to standby apply */
#define KCVFHL0C 0x8000 /* Incremental level 0 copy */
/* bits that can be set in kcvfh_status2 for move datafile */
#define KCVFH_SECONDARY 0x01 /* secondary file for datafile move */
#define KCVFH_NOT_CURRENT 0x02 /* file not current after datafile move */
#define KCVMAXBKSEC 256 /* maximum sections in a multi-section backup */

↧

Troubleshooting ORA-04031: unable to allocate 13840 bytes of shared memory “ges resource dynamic” in 12C+

August 3, 2020, 8:58 am

≫ Next: Python中内置了数据库？SQLite3 （苔花如米小，也学牡丹开）

≪ Previous: Oracle 19c hot backup mode? (二)

在12c 版本以后”ges resource dynamic”逐渐增长最终导致shared_pool可能会超过手动管理的shared pool size达到sga_max_size后出现ora-4031. 与之相关的oracle bug就好几个，这最近因为这个问题导致lmd hang堵塞了其它实例的前台进程，关掉了这个节点临时恢复，简单记录。

#db alert log

2020-08-01 09:19:24.538000 +08:00
Thread 3 advanced to log sequence 10912 (LGWR switch)
  Current log# 16 seq# 10912 mem# 0: +DATADG/ANBOB/ONLINELOG/group_16.280.1025954009
Archived Log entry 121814 added for T-3.S-10911 ID 0xd370ac7 LAD:1
2020-08-01 09:19:25.785000 +08:00
TT02: Standby redo logfile selected for thread 3 sequence 10912 for destination LOG_ARCHIVE_DEST_2
2020-08-01 09:32:07.640000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/ANBOB/anbob3/trace/billb3_lmd2_45682.trc  (incident=256225):
ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")
Incident details in: /oracle/app/oracle/diag/rdbms/ANBOB/anbob3/incident/incdir_256225/billb3_lmd2_45682_i256225.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-08-01 09:32:10.382000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/ANBOB/anbob3/trace/billb3_lmd2_45682.trc  (incident=256226):
ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")
Incident details in: /oracle/app/oracle/diag/rdbms/ANBOB/anbob3/incident/incdir_256226/billb3_lmd2_45682_i256226.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2020-08-01 09:32:13.441000 +08:00
Errors in file /oracle/app/oracle/diag/rdbms/ANBOB/anbob3/trace/billb3_lmd2_45682.trc  (incident=256227):
ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")

# 本次出现ora-4031的每一次dump trace中的heapdump 可以以”TOP”为关键字查找

      *** 2020-08-01T09:32:07.644132+08:00
        =================================
        Begin 4031 Diagnostic Information
/TOP                                                                                                                                                                                                       
         7f7f4e265000-7f7f4e26e000 r--s 0011b000 fe:07 1100698                    /oracle/app/oracle/product/12.2.0/db_1/lib/libskgxp12.so
         7f7f4e26e000-7f7f4e26f000 r--s 0011b000 fe:07 1100698                    /oracle/app/oracle/product/12.2.0/db_1/lib/libskgxp12.so
         7f7f4e26f000-7f7f4e271000 r--s 18592000 fe:07 1101565                    /oracle/app/oracle/product/12.2.0/db_1/bin/oracle
         7f7f4e271000-7f7f4e272000 rw-s 00000000 fe:07 4743170                    /oracle/app/oracle/product/12.2.0/db_1/dbs/hc_billb3.dat
         7f7f4e272000-7f7f4e275000 rw-p 00000000 00:00 0
         7f7f4e275000-7f7f4e276000 r--p 00020000 fe:00 1155074                    /lib64/ld-2.22.so
         7f7f4e276000-7f7f4e277000 rw-p 00021000 fe:00 1155074                    /lib64/ld-2.22.so
         7f7f4e277000-7f7f4e278000 rw-p 00000000 00:00 0
         7fff927e7000-7fff92822000 rw-p 00000000 00:00 0                          [stack]
         7fff92919000-7fff9291c000 r--p 00000000 00:00 0                          [vvar]
         7fff9291c000-7fff9291e000 r-xp 00000000 00:00 0                          [vdso]
         ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
         *********************** End of process map dump ****************
         Maximum map count configured per process:  65530
3<       ***** process_map_dump *****
        [TOC00005-END]
        ==============================================
        TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 1
        ----------------------------------------------
        "ges resource dynamic           "    14 GB 62%
        "ges enqueues                   "  6642 MB 29%
        "free memory                    "   721 MB  3%
        "gcs resources                  "   352 MB  2%
        "gcs shadows                    "   225 MB  1%
        "gc name table                  "   128 MB  1%
        "gcs resv res hash bucket       "   107 MB  0%
        "db_block_hash_buckets          "   101 MB  0%
        "ges resource permanent         "    49 MB  0%
        "Checkpoint queue               "    46 MB  0%
             -----------------------------------------
        free memory                         721 MB
        memory alloc.                        22 GB
        Sub total                            23 GB
        ==============================================
        TOP 10 MAXIMUM MEMORY USES FOR SGA HEAP SUB POOL 1
        ----------------------------------------------
        "ges resource dynamic           "    14 GB
        "ges enqueues                   "  6642 MB
        "free memory                    "  1201 MB
        "gcs resources                  "   352 MB
        "gcs shadows                    "   225 MB
        "gc name table                  "   128 MB
        "gcs resv res hash bucket       "   107 MB
        "db_block_hash_buckets          "   101 MB
        "ges resource permanent         "    49 MB
        "Checkpoint queue               "    46 MB
        ==============================================
        TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 2
        ----------------------------------------------
        "ges enqueues                   "  2909 MB 63%
        "free memory                    "   443 MB 10%
        "gcs resources                  "   352 MB  8%
        "gcs shadows                    "   224 MB  5%
        "gc name table                  "   128 MB  3%
        "gcs resv res hash bucket       "   107 MB  2%
        "db_block_hash_buckets          "   102 MB  2%
        "ges resource permanent         "    49 MB  1%
        "Checkpoint queue               "    46 MB  1%
        "event statistics per sess      "    26 MB  1%
             -----------------------------------------
        free memory                         443 MB
        memory alloc.                      4165 MB
        Sub total                          4608 MB
        ==============================================
        TOP 10 MAXIMUM MEMORY USES FOR SGA HEAP SUB POOL 2
        ----------------------------------------------
        "ges enqueues                   "  2909 MB
        "free memory                    "   715 MB
        "gcs resources                  "   352 MB
        "gcs shadows                    "   224 MB
        "gc name table                  "   128 MB
        "gcs resv res hash bucket       "   107 MB
        "db_block_hash_buckets          "   102 MB
        "ges resource permanent         "    49 MB
        "Checkpoint queue               "    46 MB
        "event statistics per sess      "    26 MB
        ==============================================
        TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 3
        ----------------------------------------------
        "ges resource dynamic           "    14 GB 61%
        "ges enqueues                   "  6832 MB 29%
        "free memory                    "  1011 MB  4%
        "gcs resources                  "   353 MB  1%
        "gcs shadows                    "   226 MB  1%
        "gc name table                  "   128 MB  1%
        "gcs resv res hash bucket       "   106 MB  0%
        "db_block_hash_buckets          "   101 MB  0%
        "ges resource permanent         "    49 MB  0%
        "Checkpoint queue               "    46 MB  0%
             -----------------------------------------
        free memory                        1011 MB
        memory alloc.                        22 GB
        Sub total                            23 GB
        ==============================================
        TOP 10 MAXIMUM MEMORY USES FOR SGA HEAP SUB POOL 3
        ----------------------------------------------
        "ges resource dynamic           "    14 GB
        "ges enqueues                   "  6832 MB
        "free memory                    "  1232 MB
        "KGH: NO ACCESS                 "   512 MB
        "gcs resources                  "   353 MB
        "gcs shadows                    "   226 MB
        "gc name table                  "   128 MB
        "gcs resv res hash bucket       "   106 MB
        "db_block_hash_buckets          "   101 MB
        "ges resource permanent         "    49 MB
        ==============================================
...

# 从lmhb trace文件中grep 几个bug关键字

# grep "library cache pin wait" *lmhb*.trc
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) failed for dbname ANBOB, inst 3, node 3
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) succeed for dbname ANBOB, inst 1, node 1
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) failed for dbname ANBOB, inst 3, node 3
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) succeed for dbname ANBOB, inst 1, node 1
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) failed for dbname ANBOB, inst 3, node 3
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) succeed for dbname ANBOB, inst 1, node 1
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) failed for dbname ANBOB, inst 3, node 3
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) succeed for dbname ANBOB, inst 1, node 1
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 succeed
billb3_lmhb_2380.trc:kjgcr_ServiceGCR: KJGCR_METRICS: Local metric library cache pin wait check, id 11 failed
billb3_lmhb_2380.trc:kjgcr_ChkGlobalMetric: metric 11 (library cache pin wait check) failed for dbname ANBOB, inst 3, node 3
。。。

# grep kjgcr_GrowR *lmhb*.trc
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth
billb3_lmhb_2380.trc:kjgcr_GrowResourceCache: LCP waits high, allowing res cache growth

# trace file 

*** 2020-08-01T10:15:45.376086+08:00
kjgcr_ChkGlobalMetric: metric 7 (check lck heartbeat) failed for dbname ANBOB, inst 2, node 2
*** 2020-08-01T10:15:45.380376+08:00
kjgcr_ChkGlobalMetric: metric 19 (check lgwr heartbeat) succeed for dbname +ASM, inst 3, node 3
*** 2020-08-01T10:15:45.380413+08:00
kjgcr_ChkGlobalMetric: metric 19 (check lgwr heartbeat) failed for dbname ANBOB, inst 2, node 2
*** 2020-08-01T10:15:47.362926+08:00
==============================
LCK1 (ospid: 45689) has not moved for 1249 sec (1596248147.1596246898)
*** 2020-08-01T10:15:47.367616+08:00
==============================
LGWR (ospid: 45718) has not moved for 1241 sec (1596248147.1596246906)
*** 2020-08-01T10:15:47.367682+08:00
kjgcr_ChkGlobalMetric: metric 4 (check lmd heartbeat) failed for dbname ANBOB, inst 3, node 3
*** 2020-08-01T10:15:47.367708+08:00
kjgcr_ChkGlobalMetric: metric 4 (check lmd heartbeat) succeed for dbname +ASM, inst 3, node 3
*** 2020-08-01T10:15:47.367738+08:00
kjgcr_ChkGlobalMetric: metric 6 (check lmon heartbeat) failed for dbname ANBOB, inst 3, node 3
*** 2020-08-01T10:15:47.367756+08:00
kjgcr_ChkGlobalMetric: metric 6 (check lmon heartbeat) succeed for dbname +ASM, inst 3, node 3
*** 2020-08-01T10:15:47.367778+08:00

这是个12c r2 201804 RU与之相关的有两个bug.

这个问题相关的bug
1， For Oracle Versions >=19.1 but BELOW 20.1 and ASM
Unpublished Bug:30497120 – 影响ASM Rebalance Produces AD Enqueue Leak. 在ASM 19c中频繁RB可能会导致”ges resource dynamic”增长出现ora-4031
Workaround: Set parameter _lm_broadcast_res=disable.

2. For Oracle Versions >= 12.2 but BELOW 19.1
Bug:26405036 – VERY HIGH “GES ENQUEUES” ON THE SHARED POOL
现象LMHB trace中出现”memory load check” failure.和”library cache pin wait check” failure. heapdump 中”ges resource dynamic”使用高.

Workaround: on R12.2 or above, Start pseudo reconfiguration by below command is workaround.
SQL> oradebug setmypid
SQL> oradebug lkdebug -m reconfig lkdebug

Bug:27824540 – ORA-04031 (“SHARED POOL”,”UNKNOWN OBJECT”,”SGA HEAP(1,0)”,”GES RESOURCE DYNAMIC”
现象 “ges resource dynamic” 在head dump中使用高，LMHB trace文件中出现kjgcr_GrowResourceCache和(check lck heartbeat) fail
Workaround: There are 2 possible workarounds.
1) Disable the action 11.
SQL> oradebug dyn_gcr -a 11 -disable
Note: This oradebug command is available on 12.2 and later.
2) Disable the GES resource cache; set the initialization parameter “_ges_direct_free” to TRUE.
Note: 完全禁用 GES resource cache可能会影响insert的TM资源，并且调这个参数还可能引入另一个Bug 30998759直到19.8和21c base版才修复.

3. For Oracle Versions >= 12.1.0.2 but BELOW 12.2
Unpublished Bug:21260431 – GETTING ORA-4031 AFTER 12C UPGRADE
现象 “ges resource dynamic” 在head dump中使用高， gv$ges_resource记录数只增不减。
Workaround: None. 安装oneoff patch

Unpublished Bug:21373473 – INSTANCE TERMINATED AS LMD0 AND LMD2 HUNG FOR MORE THAN 70 SECS
现象 “ges resource dynamic”使用高， gv$ges_resource中记录比预期高， DX and BB locks cached不释放. LMD可能无响应.
WORKAROUND: set _ges_direct_free_res_type=”CTARAHDXBB” 需要重启实例

4. For Oracle Versions = 12.1
Unpublished BUG:27860058, Fix For Bug 26405036 On 12.1, shown above.
WORKAROUND:只能重启实例，或安装oneoff patch

Unpublished Bug:28300808, Fix For Bug 27824540 On 12.1, shown above.
Workaround: Disable the GES resource cache; “_ges_direct_free” to TRUE影响上面提到过。或安装oneoff patch

Reference Error ORA-04031 in the Shared Pool with Huge Allocation in Memory Type “ges resource dynamic” or “ges enqueues” memory” (Doc ID 2631592.1)

↧

Python中内置了数据库？SQLite3 （苔花如米小，也学牡丹开）

August 4, 2020, 8:27 am

≫ Next: 19C: 非第一个节点执行 Root.sh 提示 “ERROR 4 OPENING DOM ASM/SELF IN 0xNNNN”

≪ Previous: Troubleshooting ORA-04031: unable to allocate 13840 bytes of shared memory “ges resource dynamic” in 12C+

SQLite 一个超轻量级数据库，以娇小的“身材”，不失性能速度并具可靠性，而经久不衰，当前在数据库流行排行榜稳居前8位，它同样是一个开源关系型数据库，任何人可用于商业或非商业用途。跃使用的SQLite数据库超过1万亿，最大支持DB大小为140 TB，执行文件2-3MB，单文件无需配置的数据库，但支持SQL和关系型数据库常见的基本功能。如果你是搞开发可能不会陌生，如果你是使用Python做开发或运维，更应该知道她，常用于嵌入式、物联网、内部测试、演示、数据科学、传输等用途。

最重要的是， SQLite实际上是作为Python库内置的。换句话说，您不需要安装任何服务器端/客户端软件，也不需要保持某些东西作为服务运行，只要您使用Python导入了该库并开始编码，那么您就可以有一个关系数据库管理系统！

下面演示一下使用

import sqlite3 as sl
import pandas as pd
# create db in filesystem
#conn = sl.connect('my-test.db')
# create db in memory 
conn = sl.connect(':memory:')
# Create a table
with conn:
    # conn.execute("""
    #      drop table events;
    #  """)
    conn.execute(
  'CREATE TABLE events(ts, msg, PRIMARY KEY(ts, msg))')
try:
# Insert values
    with conn:
        conn.executemany('INSERT INTO events VALUES (?, ?)', [
            (1, 'foo'),
            (2, 'bar'),
            (3, 'baz'),
            (5, 'foo'),
        ])
except (sl.OperationalError, sl.IntegrityError) as e:
    print('Could not complete operation:', e)
# No row was inserted because transaction failed
for row in conn.execute('SELECT * FROM events'):
# Print inserted rows
    print(row)


PS D:\code\EchartsShow> & D:/Python/Python38/python.exe d:/code/EchartsShow/testsqlite.py
(1, 'foo')
(2, 'bar')
(3, 'baz')
(5, 'foo')

更重要的是可以配合pandas 做数据科学分析

df_skill = pd.DataFrame({
    'user_id': [1,1,2,2,3,3,3],
    'skill': ['Network Security', 'Algorithm Development', 'Network Security', 'Java', 'Python', 'Data Science', 'Machine Learning']
})

# call to_sql() method of the data frame to save it into the database.
df_skill.to_sql('SKILL', conn)

# join tables 
df = pd.read_sql('''
    SELECT s.user_id, u.ts, u.msg, s.skill 
    FROM events u LEFT JOIN SKILL s ON u.ts = s.user_id
''', conn)

print(df)

# save join result to new table
df.to_sql('events_SKILL', conn)

# close connection
conn.close()

PS D:\code\EchartsShow> & D:/Python/Python38/python.exe d:/code/EchartsShow/testsqlite.py
(1, 'foo')
(2, 'bar')
(3, 'baz')
(5, 'foo')
   user_id  ts  msg                  skill
0      1.0   1  foo  Algorithm Development
1      1.0   1  foo       Network Security
2      2.0   2  bar                   Java
3      2.0   2  bar       Network Security
4      3.0   3  baz           Data Science
5      3.0   3  baz       Machine Learning
6      3.0   3  baz                 Python
7      NaN   5  foo                   None

可以直接生成Pandas Data Frame或存入SQLite数据库，当然支持关连、增、删、改、查都没问题。 PYTHON、SQL、PANDAS这都是为数据科学所提供的常用工具。更多功能自己可以去挖掘了。

↧

19C: 非第一个节点执行 Root.sh 提示 “ERROR 4 OPENING DOM ASM/SELF IN 0xNNNN”

August 10, 2020, 6:33 pm

≫ Next: 12c R2 DB Alert Log频繁输出”An internal routine has requested a dump of selected redo”

≪ Previous: Python中内置了数据库？SQLite3 （苔花如米小，也学牡丹开）

昨天一客户安装19c在非第一个节点运行root.sh时，提示下面的错误，但是检查实例状态都已启动正常。

File: /cfgtoollogs/crsconfig/rootcrs_rac-node2.log

2020/08/10 15:59:28 CLSRSC-594: Executing installation step 13 of 19: 'InstallAFD'.
2020/08/10 16:00:04 CLSRSC-594: Executing installation step 14 of 19: 'InstallACFS'.
2020/08/10 16:00:46 CLSRSC-594: Executing installation step 15 of 19: 'InstallKA'.
2020/08/10 16:00:47 CLSRSC-594: Executing installation step 16 of 19: 'InitConfig'.
2020/08/10 16:00:55 CLSRSC-594: Executing installation step 17 of 19: 'StartCluster'.
2020/08/10 16:01:44 CLSRSC-343: Successfully started Oracle Clusterware stack
2020/08/10 16:01:44 CLSRSC-594: Executing installation step 18 of 19: 'ConfigNode'.
2020/08/10 16:01:56 CLSRSC-594: Executing installation step 19 of 19: 'PostConfig'.
2020/08/10 16:02:13 CLSRSC-325: Configure Oracle Grid Infrastructure for a Cluster ... succeeded
Error 4 opening dom ASM/Self in 0x9e3add0 <-------------
Domain name to open is ASM/Self <-------------
Error 4 opening dom ASM/Self in 0x9e3add0 <------------

此错误提示是软件缺陷BUG 28308320，在远程节点执行root.sh时提示，不影响安装和使用，可以直接忽略。在20c修复

↧

12c R2 DB Alert Log频繁输出”An internal routine has requested a dump of selected redo”

August 14, 2020, 3:19 am

≫ Next: Troubleshooting ORA-4031 “init_heap_kfsg”占用大量内存 In 12c, 18c, 19c

≪ Previous: 19C: 非第一个节点执行 Root.sh 提示 “ERROR 4 OPENING DOM ASM/SELF IN 0xNNNN”

1套Oracle 12.2 4Nodes RAC ON SELS11的本地磁盘使用率告警，DIAG目录在不断的生成redo dump的trace file, db alert log也在不停的显示如下信息:

2020-08-10T21:41:31.425544+08:00
*****************************************************************
An internal routine has requested a dump of selected redo.
This usually happens following a specific internal error, when
analysis of the redo logs will help Oracle Support with the
diagnosis.
It is recommended that you retain all the redo logs generated (by
all the instances) during the past 12 hours, in case additional
redo dumps are required to help with the diagnosis.
*****************************************************************

trace file内容

*** 2020-08-10T21:41:03.876043+08:00
Dumping Short Stack
ksedsts()+346<-kcbgtcr()+28014<-ktrget2()+1056<-kdsgrp()+527<-qetlbr()+835<-qertbFetchByRowID()+1216<-qergiFetch()+567<-qersoProcessULS()+300<-qersoFetchSimple()+1433<-qersoFetch()+210<-opifch2()+3267<-kpoal8()+3490<-opiodr()+1229<-ttcpip()+1257<-opitsk()+1940
<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262<-__libc_start_main()+245Dump redo command(s):
 ALTER SYSTEM DUMP REDO DBA MIN 32 2760934 DBA MAX 32 2760934 TIME MIN 1048108263 TIME MAX 1048110123

根据MOS

You can ignore the message related to redo log if there is no error along with this information. Please note that for now, there is no way to disable the message by setting any event,  because these informative messages are useful in some cases if something goes wrong. However it is better to retain the redo/archive logs.

If you are seeing redundant dump of redo dump command and redo advisory message in alert.log, then you can apply patch 27028251.

提示如果没有错误可以忽略，为该版本bug，没有办法通过event禁用输出，可以安装补丁，在19.1修复。问题就在于你怎么知道有没有问题？

这个案例是前台爆出了ora-8103 坏块，确认应该是与它有关，刚好又是个索引对象，重建索引后以上错误没有再提示。判断ora-8103对象的方法

1. If ORA-08103 can be reproduced at will get a trace file:

From SQL*Plus execute:

alter session set max_dump_file_size=unlimited;
alter session set db_file_multiblock_read_count=1;
alter session set events 'immediate trace name trace_buffer_on level 1048576';
alter session set events '10200 trace name context forever, level 1';
alter session set events '10236 trace name context forever, level 1';
alter session set events '8103 trace name errorstack level 3';
alter session set tracefile_identifier='ORA8103';
----->>>> run the sql statement that causes the ORA-08103
alter session set events 'immediate trace name trace_buffer_off';
oradebug setmypid
oradebug tracefile_name

Identify the trace with the form of _ora__ORA8103.trc
2. If ORA-08103 is not reproducible at will, enable EVENT errorstack for ORA-08103 and event 10236 at system level, wait for the error to be reproduced and disable the events:

alter system set events '8103 trace name errorstack forever, level 3';
alter system set events '10236 trace name context forever, level 1';

Once the ORA-8103 error is reproduced, disable the events:

alter system set events '8103 trace name errorstack off';
alter system set events '10236 trace name context off';

处理方法
1，如果是索引直接重建
2，表 block 可以bbed 破坏，再使用event或package skip corrupted block，或plsql 利用rowid抽取好块上的数据，或bbed 替换block修改rdba和obj#解决
3, 表 header 使用工具扫文件抽数

The segment header block corrupted cause ORA-08103 issue (段头坏块导致ora-8103)
ora-01499 & ora-08103 caused by block corrupted or write loss a case

↧

Troubleshooting ORA-4031 “init_heap_kfsg”占用大量内存 In 12c, 18c, 19c

August 14, 2020, 6:59 am

≫ Next: Troubleshooting VI 命令 ex: 0602-101 Out of memory saving lines for undo

≪ Previous: 12c R2 DB Alert Log频繁输出”An internal routine has requested a dump of selected redo”

上周刚分享了《Troubleshooting ORA-04031: unable to allocate 13840 bytes of shared memory “ges resource dynamic” in 12C+》，在当前的新版本中又存在一个打击一片的BUG, 同样现ora-4031 占用最大的内存区为init_heap_kfsg，如下图

当前的常见触发场景是在做了RMAN备份以后，影响12-19c ，直到202004月出的RU，和19.7, 20c, 报错现象如dblaert log中或RMAN的log中

ORA-04031: unable to allocate 40 bytes of shared memory (“shared pool”,”unknown object”,”KGLH0^aeb4764″,”kglHeapInitialize:temp”)

从trace heap dump中可以看到主要是被init_heap_kfsg占用，MOS中不难确认属于Bug 30173113 。在12cR2 Apr2018RU还存在另一个Bug 31341859.

这块内存在12C以前常见于 ASM 实例的ora-4031 , 报错信息如下

ORA-04031: unable to allocate 10024 bytes of shared memory (“shared pool”,”unknown object”,”init_heap_kfsg”,”kfgpn gst”)

当时在11g中为Bug:10089333 ，在10.2.0.5中为Bug 13888380，

希望可以启到警示作用，提前预防。

如果您解决不了，请联系www.anbob.com 首页上的联系方式。

↧

Troubleshooting VI 命令 ex: 0602-101 Out of memory saving lines for undo

September 1, 2020, 8:30 pm

≫ Next: 12c注意 instance terminal caused by ASMB process ORA-04031 init_heap_kfsg

≪ Previous: Troubleshooting ORA-4031 “init_heap_kfsg”占用大量内存 In 12c, 18c, 19c

VI 在Unix、Linux系统是使用最常用的命令，DBA 经常在服务器上查看DB ALERT LOG等日志文件时，经常会遇到” ex: 0602-101 Out of memory saving lines for undo.” 报错，有时不得以用tail +more，甚至可以用awk +sed直接过滤，这里记录一下解决VI 打开报错的问题，即使百MB的文件。

[oracle@weejar:/oracle/app/oracle/diag/rdbms/prianbob/anbob1/trace> vi alert*
 2 files to edit.

 ex: 0602-101 Out of memory saving lines for undo.
:q
 There is 1 more file to edit.
:q

方法一使用EXINIT 或 .exrc 文件，EXINIT环境变量优先级覆盖.exrc文件冲突配置。vi 打开时首先会读取EXINIT变量

[oracle@weejar:/oracle/app/oracle/diag/rdbms/prianbob/anbob1/trace> export EXINIT="set ll=20000000 dir=/tmp"
[oracle@weejar:/oracle/app/oracle/diag/rdbms/prianbob/anbob1/trace> vi alert*
 2 files to edit.
"alert_anbob1.log" 2713611 lines, 151123933 characters 
Mon Aug 06 17:00:29 2018
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'en5' configured from GPnP for use as a private interconnect.
  [name='en5', type=1, ip=169.254.47.127, mac=7c-fe-90-cd-11-3b, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'en4' configured from GPnP for use as a public interface.

方法二使用-y 参数

[oracle@weejar:/oracle/app/oracle/diag/rdbms/prianbob/anbob1/trace> vi -y99999999 alert*
"alert_anbob1.log" 2714286 lines, 151153022 characters 
Mon Aug 06 17:00:29 2018
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'en5' configured from GPnP for use as a private interconnect.
  [name='en5', type=1, ip=169.254.47.127, mac=7c-fe-90-cd-11-3b, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'en4' configured from GPnP for use as a public interface.
  [name='en4', type=1, ip=1.2.3.4, mac=7c-fe-90-cd-11-3a, net=1.2.3.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'en4' configured from GPnP for use as a public interface.
  [name='en4', type=1, ip=1.2.3.227, mac=7c-fe-90-cd-11-3a, net=1.2.3.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'en4' configured from GPnP for use as a public interface.
  [name='en4', type=1, ip=1.2.3.113, mac=7c-fe-90-cd-11-3a, net=1.2.3.0/24, mask=255.255.255.0, use=public/1]
Picked latch-free SCN scheme 3
Autotune of undo retention is turned off.

↧

12c注意 instance terminal caused by ASMB process ORA-04031 init_heap_kfsg

September 4, 2020, 8:01 am

≫ Next: Troubleshooting db instance start failed PRCR-1064 CRS-2643 or CRS-2717 CRS-0223 during patching

≪ Previous: Troubleshooting VI 命令 ex: 0602-101 Out of memory saving lines for undo

上个月刚刚分享了一个ORA-4031 bug《Troubleshooting ORA-4031 “init_heap_kfsg”占用大量内存 In 12c, 18c, 19c》，那篇中还提到了这个bug, 没想到这么快就在客户遇到。12c R2还没有安装20年07月RU的注意。

alert log

    SELECT DISTINCT ORGID,ROAMREGION,ROAMORGID,OPERTYPE FROM V_ORG_ROAMREGION WHERE STATUS=1
Additional information: hd=c00000016a962890 phd=c0000003dca415a8 flg=0x20 cisid=866 sid=866 ciuid=866 uid=866
2020-09-03T13:12:31.147181+08:00
NOTE: ASMB0 terminating
2020-09-03T13:12:31.147479+08:00
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_asmb_20366.trc:
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","unknown object","init_heap_kfsg","ASM extent pointer array")
2020-09-03T13:12:31.168644+08:00
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_asmb_20366.trc:
ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","unknown object","init_heap_kfsg","ASM extent pointer array")
USER (ospid: 20366): terminating the instance due to error 4031
2020-09-03T13:12:31.225940+08:00
opiodr aborting process unknown ospid (24232) as a result of ORA-1092

这种错误通常会进入ADR中incident，在最近的incident目录，使用grep 以”TOP “关键字查询

cd $ADR/incidents
find . -type f -print -exec grep "TOP " {} \;

会看到heap dump中的sub pool使用TOP 10/20 内存区使用, 如下

==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 4
----------------------------------------------
"init_heap_kfsg                 "  1355 MB 38%
"free memory                    "   456 MB 13%
"gcs resources                  "   235 MB  7%
"KSRMFV2 State Object           "   160 MB  4%
"gcs shadows                    "   145 MB  4%
"KKSSP                          "   125 MB  3%
"ges resource permanent         "    84 MB  2%
"KGLH0                          "    70 MB  2%
"ges resource dynamic           "    66 MB  2%
"event statistics per sess      "    65 MB  2%
     -----------------------------------------
free memory                         456 MB
memory alloc.                      3128 MB
Sub total                          3584 MB
==============================================
TOP 10 MAXIMUM MEMORY USES FOR SGA HEAP SUB POOL 4
----------------------------------------------
"init_heap_kfsg                 "  1355 MB
"SQLA                           "   690 MB
"free memory                    "   651 MB
"KGLH0                          "   424 MB
"gcs resources                  "   235 MB
"KSRMFV2 State Object           "   160 MB
"gcs shadows                    "   145 MB
"KKSSP                          "   131 MB
"KGLHD                          "    96 MB
"ges resource permanent         "    84 MB
==============================================

是ASM extent 管理的内存映射区异常增长，There was Duplicate extent map array allocations due to an auto merge issue resulting in doubling the heap usage for subheap – init_heap_kfsg. 当前的RU为200414，不难查出ORA-4031 Due to High Growth in “init_heap_kfsg” Heap after Applying 12.2.0.1.200414RU ( Doc ID 2692922.1 ), 符合Bug 31341859 .

解决安装oneoff patch 31341859 或升级2020年7月RU。patch不大需要更新GI和DB, 没有停机窗口可以临时加大shared_pool 或 shared reseved pool（静态参数）.

↧

Troubleshooting db instance start failed PRCR-1064 CRS-2643 or CRS-2717 CRS-0223 during patching

September 4, 2020, 8:26 am

≫ Next: 当数据库遇上Serverless?

≪ Previous: 12c注意 instance terminal caused by ASMB process ORA-04031 init_heap_kfsg

12c注意 instance terminal caused by ASMB process ORA-04031 init_heap_kfsg上篇提到了这个bug,在安装bug是不是很顺利分享一下。

第一报错在安装GI的patch 31341859 时提示lib/libserver12.a 不存在，解决方法，从其它节点复制一个重新安装就Ok.

第二个报错rootcrs.pl -prepatch脚本是报错终止，然后想暂时启实例回头再测，启动实例失败.

oracle@anbob2:/oracle/app/oracle/product/12.2.0.1/dbhome_1/dbs> $ORACLE_HOME/bin/srvctl start instance -d stbydb  -n anbob2
PRCR-1013 : Failed to start resource ora.stbydb.db
PRCR-1064 : Failed to start resource ora.stbydb.db on node anbob2
CRS-2643: The server pool(s) where resource 'ora.stbydb.db' could run have no servers

sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Fri Sep 4 16:23:30 2020

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORA-39510: CRS error performing start on instance 'tbcsa2' on 'stbydb'
CRS-2717: Server 'anbob2' is not in any of the server pool(s) hosting resource 'ora.stbydb.db'
CRS-0223: Resource 'ora.stbydb.db' has placement error.
clsr_start_resource:260 status:223
clsrapi_start_db:start_asmdbs status:223

有种情况时“rootcrs.pl -prepatch”脚本会把RESOURCE_USE_ENABLED标志改为0，在后面”rootcrs.pl -postpatch”脚本后才回修改回1, 需要修改为1然后重启GI stack.

anbob2:[/]#crsctl get resource use
CRS-4966: Current resource use parameter value is 1 --(if it's 0)

anbob2:[/]#crsctl set resource use 1
CRS-4416: Server attribute 'RESOURCE_USE_ENABLED' successfully changed. Restart Oracle High Availability Services for new value to take effect.

anbob2:[/]#crsctl stop crs

anbob2:[/]#crsctl  start crs

恢复正常

↧

当数据库遇上Serverless?

September 6, 2020, 3:39 am

≫ Next: GoldenGate ORA-01400: cannot insert NULL into after “update” , like “upsert” or “merge”

≪ Previous: Troubleshooting db instance start failed PRCR-1064 CRS-2643 or CRS-2717 CRS-0223 during patching

Serverless是一种构建和管理基于微服务架构的完整流程，允许你在服务部署级别而不是服务器部署级别来管理你的应用部署。它与传统架构的不同之处在于，完全由第三方管理，由事件触发，存在于无状态（Stateless）、暂存（可能只存在于一次调用的过程中）计算容器内。构建无服务器应用程序意味着开发者可以专注在产品代码上，而无须管理和操作云端或本地的服务器或运行时。Serverless真正做到了部署应用无需涉及基础设施的建设，自动构建、部署和启动服务。

国内外的各大云厂商 Amazon、微软、Google、IBM、阿里云、腾讯云、华为云相继推出Serverless产品，Serverless也从概念、愿景逐步走向落地，在各企业、公司应用开来。

Serverless意味无维护，Serverless不代表完全去除服务器，而是代表去除有关对服务器运行状态的关心和担心，它们是否在工作，应用是否跑起来正常运行等等。Serverless代表的是你不要关心运营维护问题。有了Serverless，可以几乎无需Devops了。Serverless不代表某个具体技术,而是学术。

云计算与Serverless区别

微软定的“云计算是计算服务的交付 – 服务器，存储，数据库，网络，软件，分析，智能以及更多 – 通过互联网（”云“）提供更快的创新，灵活的资源和规模经济。通常，您只需支付您使用，有助于降低运营成本，更高效地运行您的基础设施云服务，和规模随着业务需求的变化。”

云计算经过这么多年的发展，逐渐进化到用户仅需关注业务和所需的资源。比如，通过K8S这类编排工具，用户只要关注自己的计算和需要的资源（CPU、内存等）就行了，不需要操心到机器这一层。

有四种主要的云计算类型：
1. 基础设施即服务（IaaS）
2. 平台即服务（PaaS）
3. 软件即服务（SaaS）
4. 无服务器 (Serverless)

Serverless架构让人们不再操心运行所需的资源，只需关注自己的业务逻辑，并且为实际消耗的资源付费。可以说，随着Serverless架构的兴起，真正的云计算时代才算到来了。

无服务器不负责服务器管理

无服务器不仅涉及在易于配置的虚拟服务器上运行。如果没有服务器，则不想管理那些虚拟机。您启动并连接到计算实例。您定义了它的形状（CPU，RAM），但是您不想知道它的物理运行位置。当然，您出于法律，性能或成本方面的原因要定义区域，而不要定义哪个数据中心，哪个机架……。这是无服务器的第二步：您无需管理物理服务器。在Oracle Cloud中，您运行一个计算实例，您可以在其中安装数据库。在AWS中，这是一个EC2实例，您可以在其中安装数据库。

无服务器的不支付服务器费用

AWS提供了真正的无服务器和弹性数据库产品：Amazon Aurora Serverless。您不必启动或停止服务器。连接时会自动完成。更多活动将添加更多服务器。没有连接会阻止它。您只需为应用程序使用的东西付费。您无需为运行的数据库服务器付费。您实际上是为应用程序使用的东西付费。Azure还具有无服务器SQL Server：https : //docs.microsoft.com/zh-cn/azure/sql-database/sql-database-serverless

在Oracle方面，可以停止和启动自治数据库。我们可以说不使用数据库时不付款，但是不使用应用程序时不能说不付款。因为即使不使用应用程序，数据库也已启动。 oracle推出 Serverless Standby Database 叫做Oracle Autonomous Data Guard，我们认为它可能被标记为“无服务器”，因为您看不到备用服务器：您没有选择形状，也没有连接到它。切换完全透明自动化，但是价格上需要购买与主服务器相同的价格购买空闲的CPU和备用存储。

无服务器托管提供了一些与常见云计算相比的独特优势，使其成为许多企业的有吸引力的选择：
1. 无需管理服务器或与服务器交互
2. 根据需要提供计算资源以自动扩展站点
3. 资源是精确分配而不是分块
4. 您只需为消耗的资源付费

↧

GoldenGate ORA-01400: cannot insert NULL into after “update” , like “upsert” or “merge”

September 7, 2020, 8:23 am

≫ Next: Alert: In Oracle ADG, if the redo apply instance crashes, all other instances will from ‘OPEN’ to ‘Mount’

≪ Previous: 当数据库遇上Serverless?

有时出于历史原因或之前使用其它同步产品做同步，后期想改为OGG方案时但不想做原数据一致性比对，或可以接受部分数据不一致，比如目标表可能比原表少数据时，只想在replicat端增加INSERTMISSINGUPDATES参数，如果update目标记录不存在转换insert. 想法是好的？前提是要知道ogg的机制。

2020-05-12 08:35:57  WARNING OGG-01003  Repositioning to rba 346238969 in seqno 322.
2020-05-12 08:35:57  WARNING OGG-01004  Aborted grouped transaction on ANBOB.OGG_SYNC_TABLE_HIS, Database error 1400 (OCI Error ORA-01400: cannot insert NULL into ("ANBOB"."OGG_SYNC_TABLE_HIS"."REC_SEQ") (status = 1400), SQL ).
2020-05-12 08:35:57  WARNING OGG-01003  Repositioning to rba 346238969 in seqno 322.
2020-05-12 08:35:57  WARNING OGG-01154  SQL error 1400 mapping TBCS.OGG_SYNC_TABLE_HIS to ANBOB.OGG_SYNC_TABLE_HIS OCI Error ORA-01400: cannot insert NULL into ("ANBOB"."OGG_SYNC_TABLE_HIS"."REC_SEQ") (status = 1400), SQL .
2020-05-12 08:35:57  ERROR   OGG-01296  Error mapping from TBCS.OGG_SYNC_TABLE_HIS to ANBOB.OGG_SYNC_TABLE_HIS.

***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************

Reading /oracle/ogg/ogg12/dirdat/yya/rp000000322, current RBA 346239182, 0 records, m_file_seqno = 322, m_file_rba = 346239182

Report at 2020-05-12 08:35:57 (activity since 2020-05-12 08:35:57)


...skipping 1 line
BATCHSQL statistics:

Last log location read:
     FILE:      /oracle/ogg/ogg12/dirdat/yya/rp000000322
     SEQNO:     322
     RBA:       346239182
     TIMESTAMP: 2020-05-11 18:11:37.004323
     EOF:       NO
     READERR:   0

2020-05-12 08:35:57  ERROR   OGG-01668  PROCESS ABENDING.

SQL error 1400 mapping TBCS.OGG_SYNC_TABLE_HIS to ANBOB.OGG_SYNC_TABLE_HIS 
OCI Error ORA-01400: cannot insert NULL into ("ANBOB"."OGG_SYNC_TABLE_HIS"."REC_SEQ") (status = 1400), SQL .

源表

SQL> @desc tbcs.OGG_SYNC_TABLE_HIS
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      WORKF_SEQ                       NOT NULL VARCHAR2(32)
    2      REC_SEQ                         NOT NULL VARCHAR2(32)
    3      ORDER_SEQ                       NOT NULL VARCHAR2(32)
    4      REGION                          NOT NULL NUMBER(5)
    5      ORDER_ID                        NOT NULL VARCHAR2(64)
    6      ORDER_PRI                       NOT NULL NUMBER(3)
    7      WORKF_ID                        NOT NULL VARCHAR2(128)
    8      SORT_ORDER                      NOT NULL NUMBER(3)
    9      WORKF_TYPE                               VARCHAR2(32)
   10      OPER_TYPE                       NOT NULL CHAR(1)
   11      PROC_SYSTEM                              VARCHAR2(32)
   12      PLAT_TYPE                       NOT NULL VARCHAR2(32)
   13      NE_ID                           NOT NULL VARCHAR2(32)
   14      TELNUM                          NOT NULL VARCHAR2(32)
   15      IMSI                                     VARCHAR2(20)
   16      CREATE_TIME                     NOT NULL DATE
   17      PROC_TIME                                DATE
....
   43      REAL_CREATETIME                          DATE

目标表

SQL> @desc tbcs.OGG_SYNC_TABLE_HIS
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      WORKF_SEQ                       NOT NULL VARCHAR2(32)
    2      REC_SEQ                         NOT NULL VARCHAR2(32)
    3      ORDER_SEQ                       NOT NULL VARCHAR2(32)
    4      REGION                          NOT NULL NUMBER(5)
    5      ORDER_ID                        NOT NULL VARCHAR2(64)
    6      ORDER_PRI                       NOT NULL NUMBER(3)
    7      WORKF_ID                        NOT NULL VARCHAR2(128)
    8      SORT_ORDER                      NOT NULL NUMBER(3)
    9      WORKF_TYPE                               VARCHAR2(32)
   10      OPER_TYPE                       NOT NULL CHAR(1)
   11      PROC_SYSTEM                              VARCHAR2(32)
   12      PLAT_TYPE                       NOT NULL VARCHAR2(32)
   13      NE_ID                           NOT NULL VARCHAR2(32)
   14      TELNUM                          NOT NULL VARCHAR2(32)
   15      IMSI                                     VARCHAR2(20)
   16      CREATE_TIME                     NOT NULL DATE
   17      PROC_TIME                                DATE
....
   43      REAL_CREATETIME                          DATE

分析trail 日志

Logdump 1 >open /oracle/ogg/ogg12/dirdat/yya/rp000000322
Current LogTrail is /oracle/ogg/ogg12/dirdat/yya/rp000000322

Logdump 4 >position 346239182
Reading forward from RBA 346239182
Logdump 5 >detail on
Logdump 6 >ghdr on
Logdump 7 >ggstoken detail
Logdump 8 >n
___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x0c)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 91 (x005b) IO Time : 2020/05/11 18:11:37.004.323
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x02) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 338336 AuditPos : 1127900176
Continued : N (x00) RecCount : 1 (x01)

2020/05/11 18:11:37.004.323 FieldComp Len 91 RBA 346239182 
Name: ANBOB.OGG_SYNC_TABLE_HIS (TDR Index: 2)
After Image: Partition 12 G e
0000 0019 0000 0015 4344 3230 3139 3132 3039 3431 | ........CD2019120941
3039 3539 3333 3339 3600 0300 0a00 0000 0000 0000 | 095933396...........
0001 3a00 0e00 1300 0000 0f34 3630 3030 3435 3332 | ..:........460004532
3833 3235 3836 000f 0015 0000 3230 3139 2d31 322d | 832586......2019-12-
3039 3a31 333a 3239 3a33 38 | 09:13:29:38
Column 0 (x0000), Len 25 (x0019)
Column 3 (x0003), Len 10 (x000a)
Column 14 (x000e), Len 19 (x0013)
Column 15 (x000f), Len 21 (x0015)

GGS tokens:
TokenID x52 'R' ORAROWID Info x00 Length 20
4141 6944 6b33 4142 5841 4146 384a 4d41 4157 0001 | AAiDk3ABXAAF8JMAAW..

FieldComp identifies records where a Compressed Update operation was written to the source database transaction log.A row in a SQL table was updated. In this format, only the changed bytes are present. Before images of unchanged columns are not logged by the database.

从源表查记录很多not null的列也都是有数据的，但是trail中只有4列，这就是问题所在。

我们看一下真正的insert

Hdr-Ind    :     E  (x45)     Partition  :     .  (x0c)  
UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41)  
RecLength  :   676  (x02a4)   IO Time    : 2020/05/11 21:39:46.004.393   
IOType     :     5  (x05)     OrigNode   :   255  (xff) 
TransInd   :     .  (x03)     FormatType :     R  (x52) 
SyskeyLen  :     0  (x00)     Incomplete :     .  (x00) 
AuditRBA   :     193974       AuditPos   : 2888195088 
Continued  :     N  (x00)     RecCount   :     1  (x01) 

2020/05/11 21:39:46.004.393 Insert               Len   676 RBA 499977859 
Name: ANBOB.OGG_SYNC_TABLE_HIS  (TDR Index: 2) 
After  Image:                                             Partition 12   G  s   
 0000 0019 0000 0015 4c46 3230 3230 3035 3131 3431 | ........LF2020051141  
 3730 3636 3430 3234 3400 0100 1600 0000 1233 3136 | 706640244........316  
 3230 3035 3131 3338 3233 3935 3531 3200 0200 1900 | 200511382395512.....  
 0000 154c 4632 3032 3030 3531 3132 3631 3338 3132 | ...LF202005112613812  
 3033 3138 0003 000a 0000 0000 0000 0000 013c 0004 | 0318.............<..  
 0010 0000 000c 3447 4c4c 5f4c 5445 5f44 454c 0005 | ......4GLL_LTE_DEL..  
 000a 0000 0000 0000 0000 001e 0006 000e 0000 000a | ....................  
Column     0 (x0000), Len    25 (x0019)  
Column     1 (x0001), Len    22 (x0016)  
Column     2 (x0002), Len    25 (x0019)  
Column     3 (x0003), Len    10 (x000a)  
Column     4 (x0004), Len    16 (x0010)  
Column     5 (x0005), Len    10 (x000a)  
Column     6 (x0006), Len    14 (x000e)  
Column     7 (x0007), Len    10 (x000a)  
Column     8 (x0008), Len    13 (x000d)  
Column     9 (x0009), Len     3 (x0003)  
Column    10 (x000a), Len    15 (x000f)  
Column    11 (x000b), Len     8 (x0008)  
Column    12 (x000c), Len     9 (x0009)  
Column    13 (x000d), Len    15 (x000f)  
Column    14 (x000e), Len    19 (x0013)  
Column    15 (x000f), Len    21 (x0015)  
Column    16 (x0010), Len    21 (x0015)  
Column    17 (x0011), Len    10 (x000a)  
Column    18 (x0012), Len    11 (x000b)  
Column    19 (x0013), Len    10 (x000a)  
Column    20 (x0014), Len    10 (x000a)  
Column    21 (x0015), Len    10 (x000a)  
Column    22 (x0016), Len     6 (x0006)  
Column    23 (x0017), Len     4 (x0004)  
Column    24 (x0018), Len     5 (x0005)  
Column    25 (x0019), Len    32 (x0020)  
Column    26 (x001a), Len    10 (x000a)  
Column    27 (x001b), Len    16 (x0010)  
Column    28 (x001c), Len     4 (x0004)  
Column    29 (x001d), Len     9 (x0009)  
Column    30 (x001e), Len     4 (x0004)  
Column    31 (x001f), Len     4 (x0004)  
Column    32 (x0020), Len     4 (x0004)  
Column    33 (x0021), Len    16 (x0010)  
Column    34 (x0022), Len     4 (x0004)  
Column    35 (x0023), Len     4 (x0004)  
Column    36 (x0024), Len     4 (x0004)  
Column    37 (x0025), Len    10 (x000a)  
Column    38 (x0026), Len    10 (x000a)  
Column    39 (x0027), Len    10 (x000a)  
Column    40 (x0028), Len    10 (x000a)  
Column    41 (x0029), Len    10 (x000a)  
Column    42 (x002a), Len    21 (x0015)  
  
GGS tokens: 
TokenID x52 'R' ORAROWID         Info x00  Length   20 
 4141 6a79 3439 4143 6f41 4146 6c66 7241 4155 0001 | AAjy49ACoAAFlfrAAU..  
TokenID x4c 'L' LOGCSN           Info x00  Length   14 
 3136 3731 3036 3331 3531 3634 3631                | 16710631516461  
TokenID x36 '6' TRANID           Info x00  Length   15 
 3438 3137 2e33 302e 3237 3934 3533 32             | 4817.30.2794532  
TokenID x69 'i' ORATHREADID      Info x01  Length    2 
 0002

关键字在trail中查找

Logdump 66 >filter all reset;
Unknown filter keyword (ALL) 
Logdump 67 >filter inc reset;
Logdump 68 >pos 0
Reading forward from RBA 0 
Logdump 69 >filter inc string 'CD2019120941095933396'
Logdump 70 >n

2020/05/11 18:11:37.004.323 FieldComp            Len    91 RBA 346238969 
Name: ANBOB.OGG_SYNC_TABLE_HIS  (TDR Index: 2) 
Before Image:                                             Partition 12   G  b   
 0000 0019 0000 0015 4344 3230 3139 3132 3039 3431 | ........CD2019120941  
 3039 3539 3333 3339 3600 0300 0a00 0000 0000 0000 | 095933396...........  
 0001 3a00 0e00 1300 0000 0f34 3630 3030 3435 3332 | ..:........460004532  
 3839 3737 3536 000f 0015 0000 3230 3139 2d31 322d | 897756......2019-12-  
 3039 3a31 333a 3239 3a33 38                       | 09:13:29:38  
Column     0 (x0000), Len    25 (x0019)  
Column     3 (x0003), Len    10 (x000a)  
Column    14 (x000e), Len    19 (x0013)  
Column    15 (x000f), Len    21 (x0015)  
  
GGS tokens: 
TokenID x52 'R' ORAROWID         Info x00  Length   20 
 4141 6944 6b33 4142 5841 4146 384a 4d41 4157 0001 | AAiDk3ABXAAF8JMAAW..  
TokenID x4c 'L' LOGCSN           Info x00  Length   14 
 3136 3731 3035 3330 3439 3332 3134                | 16710530493214  
TokenID x36 '6' TRANID           Info x00  Length   14 
 3331 3137 2e38 2e33 3438 3439 3533                | 3117.8.3484953  
TokenID x69 'i' ORATHREADID      Info x01  Length    2 
 0001                                              | ..

查看表上的trandata

 GGSCI (qdyya2 as ggadmin@ANBOBa2) 5> info trandata ANBOB.OGG_SYNC_TABLE_HIS
Logging of supplemental redo log data is enabled for table ANBOB.OGG_SYNC_TABLE_HIS.
Columns supplementally logged for table ANBOB.OGG_SYNC_TABLE_HIS: CREATE_TIME, REGION, WORKF_SEQ.
Prepared CSN for table ANBOB.OGG_SYNC_TABLE_HIS: 16633737628994

OGG Replicat Encounters OGG-01396 OGG-00869 ORA-01400 on Primary Key Column (Doc ID 1308824.1)

First check the affected trail file yyy at RBA 123456 with logdump to verify, if the PK update does not have the complete key information as described.
If that is the case and the target table does not have the corresponding PK entry, this issue is hit. Otherwise it is something different.

As a workaround use
FETCHOPTIONS FETCHPKUPDATECOLS on the capture/extract side to get all the after images of
the record so that when a HANDLECOLLISIONS logic kicks in, it will be
able to successfully convert the original PK update into insert with all the after image present.

Note: With OGG v12.2 and above following parameters should be used in the Extract instead of FETCHOPTIONS FETCHPKUPDATECOLS ！！！
LOGALLSUPCOLS
UPDATERECORDFORMAT FULL

检查 OGG 配置文件
extract

GGSCI (qdyya2 as ggadmin@ANBOBa2) 7> view param EXT2

EXTRACT ext2
...
DISCARDFILE ./dirrpt/e.dsc, APPEND, MEGABYTES 200
TRANLOGOPTIONS LOGRETENTION DISABLED
TRANLOGOPTIONS _RawDeviceOffset 0
TRANLOGOPTIONS dblogreader
-- DBOPTIONS  ALLOWUNUSEDCOLUMN
FETCHOPTIONS NOUSESNAPSHOT
BR BRINTERVAL 30M
REPORTCOUNT EVERY 30 MINUTES, RATE
exttrail /ogg/ogg12/dirdat/eb, FORMAT RELEASE 12.2
FETCHOPTIONS FETCHPKUPDATECOLS   
dynamicresolution
GETUPDATEBEFORES
NOCOMPRESSUPDATES
NOCOMPRESSDELETES
--gettruncates

replicat

replicat R_YYA
...
DiscardFile ./dirrpt/r_yya.dsc, append, Megabytes 200
MaxDiscardRecs 10
REPERROR DEFAULT ABEND
DBOPTIONS NOSUPPRESSTRIGGERS
AllowNoopUpdates
ASSUMETARGETDEFS
CHECKSEQUENCEVALUE
Insertmissingupdates  
HandleCollisions    
GETTRUNCATES
BatchSQL

If ALLOWNOOPUPDATES is specified when the HANDLECOLLISIONS or INSERTMISSINGUPDATES parameters are being used, and if Oracle GoldenGate has all of the target key values, then Oracle GoldenGate will not ignore the update, but instead will apply it using all key columns in the SET clause and the WHERE clause (invoking APPLYNOOPUPDATES behavior). This is necessary so Oracle GoldenGate can detect if the row being updated is missing. If it is, then Oracle GoldenGate turns the update into an insert.

↧

Alert: In Oracle ADG, if the redo apply instance crashes, all other instances will from ‘OPEN’ to ‘Mount’

September 8, 2020, 8:46 am

≫ Next: 我的考驾照历程之科一

≪ Previous: GoldenGate ORA-01400: cannot insert NULL into after “update” , like “upsert” or “merge”

今天在一套11 G r2版本的2节点RAC adg环境，节点1因为硬件原因异常crash（apply redo 节点），但是实例2也上的应用也都断开了（原来都是open），adg上是有连接一些只读业务，而且节点2 db alert log未发现明显手动close 实例的日志，并且是自动切换到了mount状态，RAC不是应该高可用吗？为什么死一个节点另外的节点也要跟着受影响？

这里如果检查实例2状态其实是“mount”，不知道有多少人知道数据库其实是有alter database close命令的，但是在一个实例的生命周期内手动close，也就无法再open, 并且刚才也说了，实例2 alert没有close迹象，下面附一段

# tbcsc2 dg
2020-09-03 00:10:39.601000 +08:00
ORA-01555 caused by SQL statement below (SQL ID: 4snkhx5vxrmv2, Query Duration=7340 sec, SCN: 0x0f46.c04d2402):
select....
2020-09-04 11:35:29.504000 +08:00
Archived Log entry 105312 added for thread 2 sequence 200520 ID 0x1fcb56a7 dest 1:
2020-09-08 14:16:17.954000 +08:00
Reconfiguration started (old inc 22, new inc 24)
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 23 GCS shadows cancelled, 0 closed, 0 Xw survived
 LMS 2: 34 GCS shadows cancelled, 0 closed, 0 Xw survived
 LMS 1: 19 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
2020-09-08 14:16:19.005000 +08:00
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
2020-09-08 14:16:22.014000 +08:00
ARC1: Becoming the active heartbeat ARCH
ARC1: Becoming the active heartbeat ARCH
2020-09-08 14:16:23.328000 +08:00
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
2020-09-08 14:16:24.368000 +08:00
Reconfiguration complete
Recovery session aborted due to instance crash
Close the database due to aborted recovery session
SMON: disabling tx recovery
2020-09-08 14:16:54.955000 +08:00
Stopping background process MMNL
Stopping background process MMON
2020-09-08 14:17:26.530000 +08:00
Background process MMON not dead after 30 seconds
Killing background process MMON
2020-09-08 14:18:04.907000 +08:00
Starting background process MMON
MMON started with pid=27, OS id=18743
Starting background process MMNL
MMNL started with pid=1865, OS id=18745
CLOSE: killing server sessions.
2020-09-08 14:18:07.003000 +08:00
Active process 3858 user 'grid' program 'oracle@anbob2'
Active process 14847 user 'grid' program 'oracle@anbob2'
Active process 3435 user 'grid' program 'oracle@anbob2'
Active process 25029 user 'grid' program 'oracle@anbob2'
Active process 9789 user 'grid' program 'oracle@anbob2'
Active process 23815 user 'grid' program 'oracle@anbob2'
...
Active process 24285 user 'itmuser' program 'oracle@anbob2 (TNS V1-V3)'
Active process 10045 user 'grid' program 'oracle@anbob2'
Active process 24229 user 'grid' program 'oracle@anbob2'
CLOSE: all sessions shutdown successfully.
SMON: disabling tx recovery
SMON: disabling cache recovery
2020-09-08 14:19:06.638000 +08:00


2020-09-08 14:26:56.608000 +08:00
alter database recover managed standby database using current logfile disconnect from session
Attempt to start background Managed Standby Recovery process (tbcsc2)
MRP0 started with pid=73, OS id=9746
MRP0: Background Managed Standby Recovery process started (tbcsc2)
2020-09-08 14:27:01.712000 +08:00
 started logmerger process
Managed Standby Recovery starting Real Time Apply
2020-09-08 14:27:08.171000 +08:00
Parallel Media Recovery started with 32 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Recovery of Online Redo Log: Thread 2 Group 41 Seq 201120 Reading mem 0
  Mem# 0: /dev/yyc_oravg04/ryyc_redo41
Media Recovery Log /yyc1_arch/arch_1_288357_920590168.arc
Completed: alter database recover managed standby database using current logfile disconnect from session
2020-09-08 14:27:29.221000 +08:00

-- 就为缺少主机1上的归档日志所以无法应用，我们cancel redo apply,abort了数据库
alter database recover managed standby database cancel
2020-09-08 14:31:57.640000 +08:00


WARNING: inbound connection timed out (ORA-3136)
2020-09-08 14:33:30.958000 +08:00
Shutting down instance (abort)

也就”Close the database due to aborted recovery session” 给出了一个原因，close 数据库是因为recover session 终止了，其实这是RAC ADG的预期行为，在这里不得不吐槽一下ORACLE MOS文档标题是写给oracle工程师或专业人看的，让人很费解，如12c alert log路径改了标题是12.1.0.2 Oracle Clusterware Diagnostic and Alert Log Moved to ADR (Doc ID 1915729.1)，从数据库里读操作系统上的文件内容叫”外部表”，谁知道啥是内部表，对于这个问题偏的倒不是很远。Active Data Guard RAC Standby – Apply Instance Node Failure Impact (Doc ID 1357597.1) 给出了明确解释，

简而言之就是，如果apply redo应用日志的实例进程异常终止后，其它所有OPEN READ ONLY的实例会close, 因为在RAC ADG环境中，如果实例在应用日志过程中中断crash, 会把CACHE FUSION的锁留到残留幸存的实例中，会导致数据查询不一致，因次需要关闭数据库，重新打开来保证buffer cache和datafile 的一致状态。如果配置了DG BROKER 这个操作可以自动完成，版本大于11.2.0.2，如果没有配置自动，手动方式直接open 就可以了，接着手动执行应用日志命令，继续在幸存的节点上应用日志。

如果你说在你的RAC ADG环境如11.2 或 12.2版本，abort 应用日志的实例，其他实例还是open read only状态，也没有配置DG broker,别开心，那很可能你的实例存在BUG, 如Bug 13147164 ， Bug 12946790 。

附上MOS那段解释
Symptoms
In an Active Data Guard RAC standby, if the redo apply instance crashes, all other instances of that standby that were open Read Only will be closed and returned to the MOUNT state. This disconnects all readers of the Active Data Guard standby.

Cause
In an Active Data Guard RAC standby, if the redo apply instance crashes in the middle of media recovery, it leaves the RAC cache fusion locks on the surviving instances and the data files on disk in an in-flux state. In such a situation, queries on the surviving instances can potentially see inconsistent data. To resolve this in-flux state, the entire standby database is closed. Upon opening the first instance after such a close, the buffer caches and datafiles are made consistent again.

↧

我的考驾照历程之科一

September 9, 2020, 11:48 pm

≫ Next: Oracle 11g R2 rman spin 产生大量aud trace

≪ Previous: Alert: In Oracle ADG, if the redo apply instance crashes, all other instances will from ‘OPEN’ to ‘Mount’

报名2年多的驾校，如果不是因为快到期了，还想不起来考一下，昨天抽时间去考了科一，提前刷了2周的题，还是有些题不在题库，开始以为反正可以错10题，卡卡错，在做到第80道时，颤抖手数了数错题已达10个，开始顶着压力做完，交卷， 90分，通过， ~_~! …

下面是自己整理的秘籍。。

链接：https://pan.baidu.com/s/13YOI2CiQtgGF27GYaA79qQ
提取码：8bqd

↧

Oracle 11g R2 rman spin 产生大量aud trace

September 17, 2020, 7:23 pm

≫ Next: ‘transaction’ event 2 & How to find dead transaction?

≪ Previous: 我的考驾照历程之科一

本地文件系统使用率告警，分析发现audit目录下不断的生成trace文件，该目录记录的是sys登录，目前以每秒800-900KB的速度生成写日志属于一种不正常现象，先增加crontab 周期清空日志，是当前版本的一个rman相关的bug, 下面记录这个bug. 版本11.2.0.3 RAC on AIX.

root@anbob1:/oracle/app/oracle/product/11.2.0.3/dbhome_1/rdbms/audit>while sleep 1; do du -sk . ; done;
87368   .
88180   .
89044   .
89956   .
90824   .
91496   .
92248   .
93096   .
93884   .
94680   .
95412   .
96280   .
97032   .

root@anbob1:/oracle/app/oracle/product/11.2.0.3/dbhome_1/rdbms/audit>ls -l
total 21104
-rw-r-----    1 oracle   oinstall   10240174 Sep 18 09:30 orcl1_ora_43713172_13cb.aud
-rw-r-----    1 oracle   oinstall     556702 Sep 18 09:30 orcl1_ora_43713172_13cc.aud

root@anbob1:/oracle/app/oracle/product/11.2.0.3/dbhome_1/rdbms/audit>more orcl1_ora_43713172_13cb.aud
Audit file /oracle/app/oracle/product/11.2.0.3/dbhome_1/rdbms/audit/orcl1_ora_43713172_13cb.aud
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/11.2.0.3/dbhome_1
System name:    AIX
Node name:      anbob1
Release:        1
Version:        7
Machine:        00F80C614C00
Instance name: orcl1
Redo thread mounted by this instance: 1
Oracle process number: 771
Unix process pid: 43713172, image: oracle@anbob1 (TNS V1-V3)

Fri Sep 18 09:30:12 2020 +08:00
LENGTH : '361'
ACTION :[211] 'begin sys.dbms_backup_restore.createRmanOutputRow( l0row_id    => :l0row_id, l0row_stamp => :l0row_stamp, row_id      => :row_id, row_stamp   => :row_stamp, txt         => :txt, samelin
e    => :i_sameline); end;'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[10] '3519797422'

Fri Sep 18 09:30:12 2020 +08:00
LENGTH : '361'
ACTION :[211] 'begin sys.dbms_backup_restore.createRmanOutputRow( l0row_id    => :l0row_id, l0row_stamp => :l0row_stamp, row_id      => :row_id, row_stamp   => :row_stamp, txt         => :txt, samelin
e    => :i_sameline); end;'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[10] '3519797422'

是备份相关的任务，查看RMAN进程

anbob1:/home/oracle> ps -ef|grep rman
  oracle 42008748 63176732   0 09:31:27  pts/6  0:00 grep rman
  oracle 11667762 46139160  21 15:31:20      - 144:00 rman target sys/manager rcvcat rman/rman@veritas msglog /usr/openv/netbackup/ext/db_ext/scripts/oracle_arch.sh.out append
anbob1:/home/oracle> tail -f /usr/openv/netbackup/ext/db_ext/scripts/oracle_arch.sh.out
RMAN-00558: error encountered while parsing input commands
RMAN-01006: error signaled during parse
RMAN-02005: token too big

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01006: error signaled during parse
RMAN-02005: token too big


RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00558: error encountered while parsing input commands
RMAN-01006: error signaled during parse
RMAN-02005: token too big
...

属于Bug 12861117 RMAN session spins, 在11.2.0.4和12.1修复。因为磁盘空间满或rman input error导致rman session进入无限spin, KILL 进程临时解决。

root@anbob1:/oracle/app/oracle/product/11.2.0.3/dbhome_1/rdbms/audit>ps -ef|grep rman
    root 65275488 25692658   0 09:48:54  pts/6  0:00 grep rman
  oracle 11667762 46139160  17 15:31:20      - 146:27 rman target sys/manager rcvcat rman/rman@veritas msglog /usr/openv/netbackup/ext/db_ext/scripts/oracle_arch.sh.out append

root@anbob1:/oracle/app/oracle/product/11.2.0.3/dbhome_1/rdbms/audit>kill -9 11667762

↧

‘transaction’ event 2 & How to find dead transaction?

September 17, 2020, 7:47 pm

≫ Next: Oracle 19c RAC新特性 : Automatic Failback of a Service

≪ Previous: Oracle 11g R2 rman spin 产生大量aud trace

6年前记录过这篇关于“transaction” eventTuning “transaction” & TX lock wait event ,speeding up rollback dead transaction,今天补充些取其它信息.如何找到哪个事务dead。

大量active session等待event ‘transaction’，关于该事件参考上一篇。

1，先检查undo是否够用

SQL> select tablespace_name,status,sum(bytes)/1024/1024 mb from DBA_UNDO_EXTENTS group by tablespace_name,status; 

TABLESPACE_NAME                STATUS            MB
------------------------------ --------- ----------
UNDOTBS1                       ACTIVE     25055.125
UNDOTBS1                       EXPIRED     39912.75
UNDOTBS1                       UNEXPIRED  17884.625
UNDOTBS2                       ACTIVE           211
UNDOTBS2                       EXPIRED   33974.6875
UNDOTBS2                       UNEXPIRED 10753.8125

2, 查看实例回滚

SQL> select usn,slt,seq, state,XID, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",          
       decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"   
       from v$fast_start_transactions;  

       USN        SLT        SEQ STATE            XID                                 Total       Done       ToDo Estimated time to
---------- ---------- ---------- ---------------- ------------------------------ ---------- ---------- ---------- -----------------
      2084         31   30239922 RECOVERING       0824001F01CD6CB2                  1284125       8181    1275944 20200916 15:49:39
      1892          6   36949771 RECOVERED        076400060233CF0B                       12         12          0 20200916 11:14:07
      2062         32   26586849 RECOVERED        080E00200195AEE1                       13         13          0 20200916 11:14:07
      1854          5   35122956 RECOVERED        073E00050217EF0C                       33         33          0 20200916 11:14:07
      2064         33   21948112 RECOVERED        08100021014EE6D0                       13         13          0 20200916 11:14:07
      1877         23   38143450 RECOVERED        07550017024605DA                       14         14          0 unknown
      2108         28   30836789 RECOVERED        083C001C01D68835                       11         11          0 20200916 11:14:07
      1792          9   30960200 RECOVERED        0700000901D86A48                       17         17          0 20200916 11:14:07

3, 检查死事务

SQL> select * from x$ktuxe where ktuxesta!='INACTIVE';

ADDR                   INDX    INST_ID   KTUXEUSN   KTUXESLT   KTUXESQN  KTUXERDBF  KTUXERDBB  KTUXESCNB  KTUXESCNW KTUXESTA         KTUXECFL                   KTUXEUEL  KTUXEDDBF  KTUXEDDBB  KTUXEPUSN  KTUXEPSLT  KTUXEPSQN   KTUXESIZ
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- ------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
...
9FFFFFFF7F3F3610      69701          1       2048          5   30193948        934     809672 3893691017       3912 ACTIVE           NONE                             14          0          0          0          0          0          1
9FFFFFFF7F3F3668      70552          1       2073          6   28549131          0          0 3799617294       3912 ACTIVE           NONE                             18          0          0          0          0          0          0
9FFFFFFF7F3F3CF0      70809          1       2080         25   31394328        934     585592 3893260011       3912 ACTIVE           NONE                             19          0          0          0          0          0         79
9FFFFFFF7F3F3F00      70951          1       2084         31   30239922        934    2436487 3447352365       3912 ACTIVE           DEAD                           4504          0          0          0          0          0    1269278
9FFFFFFF7F3F3DF8      71084          1       2088         28   24954357          0          0 3893691006       3912 ACTIVE           NONE                              3          0          0          0          0          0          0
9FFFFFFF7F3F39D8      71242          1       2093         16   25330033        567      54688 1053850526       3907 ACTIVE           NONE                             16          0          0          0          0          0          7
9FFFFFFF7F3F3DA0      71389          1       2097         27   30301638          3     769294 3893662854       3912 ACTIVE           NONE                             21          0          0          0          0          0          1
...
72 rows selected.

4, 检查事务开始时间

--scn wrap +base
SQL> select 3912*power(2,32)+3447352365 from dual;

3912*POWER(2,32)+3447352365
---------------------------
             16805359414317

SQL> select scn_to_timestamp(16805359414317) from dual;

SCN_TO_TIMESTAMP(16805359414317)
---------------------------------------------------------------------------
16-SEP-20 01.23.15.000000000 AM

5, 查看事务会话信息

SQL> select * from v$active_session_history where xid=hextoraw('0824001F01CD6CB2');

no rows selected

SQL> select min(sample_time),min(sample_time),USER_ID,instance_number,machine,program,sql_id,sql_opname,SQL_EXEC_START,event,WAIT_CLASS from dba_hist_active_sess_history where xid=hextoraw('0824001F01CD6CB2') and sample_time> sysdate-1 group by USER_ID,instance_number,machine,program,sql_id,sql_opname,SQL_EXEC_START,event,WAIT_CLASS;

MIN(SAMPLE_TIME)                    MIN(SAMPLE_TIME)               USER_ID INSTANCE_NUMBER MACHINE    PROGRAM                                 SQL_ID          SQL_OPNAME              SQL_EXEC_START    EVENT
-------------------------------- -- ----------------------------- -------- --------------- ---------- ----------------------------------- --------------- ----------------------- ----------------- --------------------                            
16-SEP-20 07.26.14.722 AM           16-SEP-20 07.26.14.722 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)                                                                
16-SEP-20 07.25.53.953 AM           16-SEP-20 07.25.53.953 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)                                                                    gc current grant 2-w                                                               
16-SEP-20 07.25.43.503 AM           16-SEP-20 07.25.43.503 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)                                                                    db file sequential r                                                                
16-SEP-20 01.24.10.954 AM           16-SEP-20 01.24.10.954 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)          09mws4h37zp3m   DELETE                  20200916 01:23:15                                                                
16-SEP-20 03.05.32.630 AM           16-SEP-20 03.05.32.630 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)          09mws4h37zp3m   DELETE                  20200916 01:23:15 gc current request                                                                 
16-SEP-20 01.26.34.755 AM           16-SEP-20 01.26.34.755 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)          09mws4h37zp3m   DELETE                  20200916 01:23:15 gc current grant 2-w                                                                
16-SEP-20 01.23.19.504 AM           16-SEP-20 01.23.19.504 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)          09mws4h37zp3m   DELETE                  20200916 01:23:15 db file sequential r                                             
16-SEP-20 07.03.13.776 AM           16-SEP-20 07.03.13.776 AM            0               1 qdyyb1     sqlplus@qdyyb1 (TNS V1-V3)          09mws4h37zp3m   DELETE                  20200916 01:23:15 gc current grant con

6, 查找事务SQL

SQL> select * from dba_hist_sqltext where sql_id='09mws4h37zp3m';
                                                                                                             
      DBID SQL_ID                                                                                            
---------- ---------------                                                                                   
SQL_TEXT                                                                                                     
---------------------------------------------------------------------------
COMMAND_TYPE                                                                                                 
------------                                                                                                 
 343193180 09mws4h37zp3m                                                                                     
delete /*+parallel (t 16)*/from ANBOB.BIG_TABLES t where to_char(ENDDATE,'YYYYMMDD')<20200801

如果有做global hanganalyze，可以看到是在等SMON rolling.

SQL> oradebug -g all hanganalyze 3
Hang Analysis in /oracle/app/oracle/diag/rdbms/tbcsb/tbcsb2/trace/tbcsb2_diag_24442.trc
SQL> exit

		   
Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'wait for stopper event to be increased'<='enq: TX - row lock contention'
     Chain 1 Signature Hash: 0xc0295d85
 [b] Chain 2 Signature: 'wait for stopper event to be increased'<='enq: TX - row lock contention'
     Chain 2 Signature Hash: 0xc0295d85
 [c] Chain 3 Signature: 'wait for stopper event to be increased'<='enq: TX - row lock contention'
     Chain 3 Signature Hash: 0xc0295d85

===============================================================================
Non-intersecting chains:

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 2 (tbcsb.tbcsb2)
                   os id: 12342
              process id: 9840, oracle@qdyyb2
              session id: 3
        session serial #: 16363
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x824001f
                      p3: 'sequence'=0x1cd6cb2
            time in wait: 0.212598 sec
           timeout after: 2 min 59 sec
                 wait id: 575369
                blocking: 0 sessions
             current sql: DELETE FROM big_tables T WHERE T.CUSTGROUPID = :B3 AND T.CUSTNO = :B2 AND T.REGION = :B1
             short stack: ksedsts()+544<-ksdxfstk()+48<-ksdxcb()+3216<-sspuser()+688<-<-_pw_wait()+48<-pw_wait()+112<-sskgpwwait()
+432<-skgpwwait()+320<-ksliwat()+3328<-kslwaitctx()+304<-kslwa
it()+192<-$cold_ktcwit1()+8592<-$cold_kdddgb()+18256<-kdddel()+688<-kaudel()+96<-delrow()+2960<-qerdlFetch()+1456<-delexe()+2752<-opiexe()+
22032<-opipls()+4192<-opiodr()+2416<-rpidrus()+432<-skgmstack
()+224<-rpidru()+224<-rpiswu2()+1120<-rpidrv()+2736<-psddr0()+496<-psdnal()+1136<-pevm_EXECC()+1312<-pfrinstr_EXECC()+144<-pfrrun_no_tool()+192<-
            wait history:
              * time between current wait and wait #1: 0.000079 sec
              1.       event: 'transaction'
                 time waited: 1.011599 sec
                     wait id: 575368          p1: 'undo seg#|slot#'=0x824001f
                                              p2: 'wrap#'=0x1cd6cb2
                                              p3: 'count'=0xbe4
              * time between wait #1 and #2: 0.000007 sec
              2.       event: 'DFS lock handle'
                 time waited: 0.001039 sec
                     wait id: 575367          p1: 'type|mode'=0x54410005
                                              p2: 'id1'=0x3
                                              p3: 'id2'=0x824
              * time between wait #2 and #3: 0.000480 sec
              3.       event: 'enq: TX - row lock contention'
                 time waited: 0.337039 sec
                     wait id: 575366          p1: 'name|mode'=0x54580006
                                              p2: 'usn<<16 | slot'=0x824001f p3: 'sequence'=0x1cd6cb2 } and is blocked by => Oracle session identified by:
    {
                instance: 1 (anbob.orcl1)
                   os id: 15416
              process id: 36, oracle@anbob1 (SMON)
              session id: 8137
        session serial #: 1
    }
    which is waiting for 'wait for stopper event to be increased' with wait info:
    {
            time in wait: 0.002368 sec
           timeout after: 0.097632 sec
                 wait id: 342812575
                blocking: 26 sessions
             current sql: 
             short stack: ksedsts()+544<-ksdxfstk()+48<-ksdxcb()+3216<-sspuser()+688<-<-_poll_sys()+48<-_poll()+224<-ssskgxp_poll()+208<-
sskgxp_selectex()+1872<-skgxpiwait()+9424<-skgxpwaiti()
+976<-skgxpwait()+416<-ksxpwait()+2880<-$cold_ksliwat()+2288<-kslwaitctx()+304<-kjusuc()+8080<-ksigeti()+2192<-
$cold_kturUndoSegmentNeedsRecovery()+400<-$cold_kturRecoverActiveTxns()+2816<-$cold_ktprb
eg()+8576<-ktmmon()+9008<-ktmSmonMain()+496<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main(
            wait history:
              * time between current wait and wait #1: 0.000501 sec
              1.       event: 'DFS lock handle'
                 time waited: 0.000225 sec
                     wait id: 342812574       p1: 'type|mode'=0x54410005
                                              p2: 'id1'=0x3
                                              p3: 'id2'=0xb74
              * time between wait #1 and #2: 0.000041 sec
              2.       event: 'DFS lock handle'
                 time waited: 0.000281 sec
                     wait id: 342812573       p1: 'type|mode'=0x54410005
                                              p2: 'id1'=0x3
                                              p3: 'id2'=0xb73
              * time between wait #2 and #3: 0.000043 sec
              3.       event: 'DFS lock handle'
                 time waited: 0.000534 sec
                     wait id: 342812572       p1: 'type|mode'=0x54410005
                                              p2: 'id1'=0x3
                                              p3: 'id2'=0xb72
    }

↧

Oracle 19c RAC新特性 : Automatic Failback of a Service

September 29, 2020, 7:23 am

≫ Next: Alert: 12c top-N fetch first错误的执行计划 19c已修复

≪ Previous: ‘transaction’ event 2 & How to find dead transaction?

Oracle数据库服务的高可用性一直是RAC，其它关系型数据库不可匹敌的功能。应用配置TFA,当数据库实例发生故障时，以该实例为首选实例的服务将故障转移到另一个可用实例。不幸的是，实例再次启动后，服务并没有故障切换回原始实例。dba必须重新ralocate service服务。Oracle数据库19c对此进行了更改,增加了自动回归。

$ srvctl status database -db anbob
Instance ANBOB1 is running on node rac1
Instance ANBOB2 is running on node rac2

让我们创建一个简单的service服务,然后增加fal选项，增加failback选项,当然可以在创建service一次性指定。

$ srvctl add service -db anbob -service anbob2_1 -preferred anbob2 -available anbob1
$ srvctl start service -db anbob -service anbob2_1
$ srvctl modify service -db anbob -service anbob2_1 -failovertype SESSION -failovermethod BASIC -failoverdelay 10 -failoverretry 3
$ srvctl modify service -db anbob -service anbob2_1 -failback YES

当我们尝试reboo实例2，或者kill实例2的核心进程，模拟一种故障。注意正常的instance shutdown并不会导致service failover.

当实例2异常终止后(kill db pmon process)

$ srvctl status service -db anbob-service anbob2_1 
Service ANBOB2_1 is running on instance(s) RAC1

这是一种预期行为，之前的版本也可以做到这样，但是很快oraagent把db instance再次拉起。

$ srvctl status service -db anbob-service anbob2_1 
Service ANBOB2_1 is running on instance(s) RAC2

service服务又自动回到了node2 。

↧

Alert: 12c top-N fetch first错误的执行计划 19c已修复

October 3, 2020, 2:13 am

≫ Next: 19c Flashback Standby after Flashback (resetlogs) on Primary In Dataguard Environment

≪ Previous: Oracle 19c RAC新特性 : Automatic Failback of a Service

Oracle 12c new feature:OFFSET n FETCH n row-limit 7年前我尝试过12C新支持的TOP-n新语法，使应用中分页代码看上去更简洁，内部也是利用了一种窗口函数的方法，如果你在应用中使用了该语法，在19c的数据库前需要注意SQL的效率是否比之前的order by 子查询加 rownum的更差了。其实这是oracle在12c或18c版本中的bug, 在19C中已经解决，这也是建议升级19c而非12c跳过的一个小坑，最近有个客户升级的12c总是惊喜不断，bug连连，看fix基本都是在19c中，去年我也分享过从oracle的产品经理那介绍过的一篇<浅谈Oracle Database 19c>,19c中oracle开发人员主要是以修复大量已知bug为主没有引入过多的新特性. 没有理由到现在还选择12c而非19c,更不应以选择“次新”版本这种陈旧理论拒绝19c. 下面简单演示这个问题

创建测试环境

drop table t;
create table t nologging as
select d.* from dba_objects d,
( select 1 from dual connect by level <= 10 )
where object_id is not null;

alter table t noparallel;
alter table t modify object_id not null;

create index ix on t ( object_id ) ;
exec dbms_stats.gather_table_stats(user,'t');

— 12c 版本中查看执行情况

[oracle@anbob ~]$ ora
SQL*Plus: Release 12.2.0.1.0 Production on Thu Oct 1 09:14:35 2020

SQL_ID  0qtbtttf5rs5y, child number 0
-------------------------------------
select * from ( select *   from   t   order by object_id desc   ) where
rownum <= 10
Plan hash value: 1635572796

----------------------------------------------------------------------------
| Id  | Operation                     | Name | E-Rows |E-Bytes| Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |        |       |    13 (100)|
|*  1 |  COUNT STOPKEY                |      |        |       |            |
|   2 |   VIEW                        |      |     10 |  4810 |    13   (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID| T    |    728K|    92M|    13   (0)|
|   4 |     INDEX FULL SCAN DESCENDING| IX   |     10 |       |     3   (0)|
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10) 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
       2757  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL_ID  8jd6tct901zsq, child number 0
-------------------------------------
select * from   t order by object_id asc fetch first 10 rows only

Plan hash value: 3047187157

----------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | E-Rows |E-Bytes|E-Temp | Cost (%CPU)|  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |        |       |       | 33053 (100)|       |       |          |
|*  1 |  VIEW                    |      |     10 |  5070 |       | 33053   (1)|       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK|      |    728K|    92M|   135M| 33053   (1)|  6144 |  6144 | 6144  (0)|
|   3 |    TABLE ACCESS FULL     | T    |    728K|    92M|       |  3857   (1)|       |       |          |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T"."OBJECT_ID")<=10)
22 rows selected.
Statistics
----------------------------------------------------------
          0  recursive calls
          4  db block gets
      14192  consistent gets
      14180  physical reads
          0  redo size
       2689  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         10  rows processed

使用一种first_rows的方法可以临时避免。
select /*+ FIRST_ROWS(10) */* 
from   t     
order by object_id asc 
fetch first 10 rows only; 

SQL_ID  dqmm3bfv24n73, child number 0
-------------------------------------
select /*+ FIRST_ROWS(10) */* from   t order by object_id asc fetch
first 10 rows only

Plan hash value: 4127887649
----------------------------------------------------------------------------
| Id  | Operation                     | Name | E-Rows |E-Bytes| Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |        |       |    13 (100)|
|*  1 |  VIEW                         |      |     10 |  5070 |    13   (0)|
|*  2 |   WINDOW NOSORT STOPKEY       |      |     10 |  1330 |    13   (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID| T    |    728K|    92M|    13   (0)|
|   4 |     INDEX FULL SCAN           | IX   |     10 |       |     3   (0)|
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T"."OBJECT_ID")<=10)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         15  consistent gets
          0  physical reads
          0  redo size
       2689  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

在12c中使用first limit的语法使用的是全表扫描，导致大量的逻辑读，使用first_row（n)的hint可以临时解决这个问题，但是我们并不想为了让代码比过去的where rownum更简洁而又附加上first_row hint. 下面在19c执行同样TOP-N

SQL> select comments from REGISTRY$HISTORY;

COMMENTS
--------------------------------------------------------------------------------
RDBMS_19.3.0.0.0DBRU_LINUX.X64_190417
Patch applied on 19.3.0.0.0: Release_Update - 190410122720
Elapsed: 00:00:00.01

SQL> select * from   t order by object_id asc fetch first 10 rows only;
10 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4127887649

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |    10 |  5070 |    13   (0)| 00:00:01 |
|*  1 |  VIEW                         |      |    10 |  5070 |    13   (0)| 00:00:01 |
|*  2 |   WINDOW NOSORT STOPKEY       |      |    10 |  1320 |    13   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    |   724K|    91M|    13   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX   |    10 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T"."OBJECT_ID")<=10)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
       2897  bytes sent via SQL*Net to client
        427  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

在19c中使用了我们想要的执行计划，判断应该是在哪个bug中修复了，如果我们不能进MOS，可以先尝试从数据库中v$system_fix_control 视图查找一下。oracle虽然是非开源软件，但是对外提供了很多查询”接口” view，而无需翻看代码, 我们可以以first 或 window为关键字查找，过滤19c开始修复的与之相关的bug

SQL> @sysfix window

     BUGNO      VALUE SQL_FEATURE                         DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT     CON_ID
---------- ---------- ----------------------------------- ---------------------------------------------------------------- ------------------------- ---------- ---------- ----------
  25323193          1 QKSFM_COMPILATION_25323193          Remove pruned window functions from select and order by          8.0.0                              0          1          1
  22174392          1 QKSFM_FIRST_ROWS_22174392           first k row optimization for window function rownum predicate    19.1.0                             0          1          1
  23002609          1 QKSFM_EXECUTION_23002609            Clear key count of window OG (containing GBY) w/ constant keys o 12.2.0.1                           0          1          1
  17986549          1 QKSFM_FILTER_PUSH_PRED_17986549     push valid filters into UNION ALL branches with window functions 12.2.0.1                           0          1          1
  13735304          1 QKSFM_TRANSFORMATION_13735304       relax restrictions on window function replaces subquery          12.1.0.1                           0          1          1
  13321547          1 QKSFM_ACCESS_PATH_13321547          Avoid WINDOW SORT/WINDOW BUFFER SORT when index is already sorte 11.2.0.4                           0          1          1
  10226906          1 QKSFM_SQL_CODE_GENERATOR_10226906   ignore OBY clumping for grand-total window functions             11.2.0.3                           0          1          1
  12410972          1 QKSFM_FILTER_PUSH_PRED_12410972     push predicate with NLS_SORT in window function                  11.2.0.3                           0          1          1
  10230017          1 QKSFM_SQL_CODE_GENERATOR_10230017   use range parallelism for window function count on a constant    11.2.0.3                           0          1          1
   9024933          1 QKSFM_JPPD_9024933                  Do not allow Old JPPD for OJ view with window function           11.2.0.2                           0          1          1
   7127530          1 QKSFM_TRANSFORMATION_7127530        window function replaces having subquery                         11.2.0.1                           0          1          1
   7388652          1 QKSFM_TRANSFORMATION_7388652        window function replaces uncorrelated subquery with view         11.2.0.1                           0          1          1
   7385140          1 QKSFM_TRANSFORMATION_7385140        early window function removal with CBQT                          11.2.0.1                           0          1          1
   6119510          1 QKSFM_JPPD_6119510                  Allow JPPD for union-all views with window functions             11.1.0.6                           0          1          1
   6146906          1 QKSFM_TRANSFORMATION_6146906        amend fix of bug 3697218 for window func                         10.2.0.5                           0          1          1
   7576516          1 QKSFM_SQL_CODE_GENERATOR_7576516    make only the topmost window node positionable                   10.2.0.5                           0          1          1
   5302124          1 QKSFM_TRANSFORMATION_5302124        Allow CBQT for queries with window functions                     10.2.0.4                           0          1          1

17 rows selected.

Note:
看到有个bug非常相似，值1为修复，我可以尝试关掉这个bug修复判断是否问题可以还原，更甚至可以在SQL语句级关闭一个bug修复。

select /*+ opt_param('_fix_control' '22174392:OFF') */ * from   t order by object_id asc fetch first 10 rows only;
Plan hash value: 3047187157

-----------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    10 |  5070 |       | 25200   (1)| 00:00:01 |
|*  1 |  VIEW                    |      |    10 |  5070 |       | 25200   (1)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |   724K|    91M|   131M| 25200   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | T    |   724K|    91M|       |  3827   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "T"."OBJECT_ID")<=10)

16 rows selected.

Elapsed: 00:00:00.03

Note:
确认是在19c中修复的bug 22174392解决了12c中使用first limit TOP-N语法中错误的代价估算，而生成的错误的执行计划。

选择一下合适的数据库版本，减少不比要的性能及稳定性烦恼，当前及未来三年内oracle数据库建议19c,同时更新较新RU。升级做好充分的SPA或RAT更是尤为重要。

如果你存在升级忧虑可以联系www.anbob.com 首页上的联系方式。

↧

19c Flashback Standby after Flashback (resetlogs) on Primary In Dataguard Environment

October 3, 2020, 2:54 am

≫ Next: Oracle19c使用USE_LARGE_PAGES可在LINUX平台的自动配置hugepage

≪ Previous: Alert: 12c top-N fetch first错误的执行计划 19c已修复

有时需要应用版本上线做一些测试，希望做完数据库操作后利用restore point回滚点或做了基于时间点的恢复后，闪回数据库到修改以前时间点，然后standby继续应用日志恢复DG。因为在flashback后因为需要open resetlogs打开，在有dataguard的环境需要注意, 如果不想重建DG。同时oracle 19c引入了新特性，standby可以自动闪回数据库。

1, 利用standby + failover+ flashback database

# 开启闪回数据库

SQL> alter database flashback on;

# 闪回保留时间

SQL> show parameter db_flashback_retention_target

# 确认闪回时间

SQL> select oldest_flashback_scn, oldest_flashback_time from v$flashback_database_log;

# 确认转换时间

SQL> select STANDBY_BECAME_PRIMARY_SCN from v$database;

# 激活standby

alter database recover managed standby database finish;
alter database commit to switchover to primary with session shutdown;

— do some thing

# 闪加standby 到failover前

SQL> flashback database to scn 【above became primary scn】
SQL> alter database convert to physical standby;
SQL> shutdown immediate;

2, 利用primary +flashback database

# 确认主与备库开启了flashback database
— PRIMARY DB

SQL> select name,database_role,flashback_on from v$database;
NAME DATABASE_ROLE FLASHBACK_ON
--------- ---------------- ------------------
PRIMDB PRIMARY YES

-- STANDBY DB
SQL> select name,database_role,flashback_on from v$database;

NAME DATABASE_ROLE FLASHBACK_ON
--------- ---------------- ------------------
STDBY PHYSICAL STANDBY YES

NAME DATABASE_ROLE FLASHBACK_ON
——— —————- ——————
STDBY PHYSICAL STANDBY YES

# 在primary创建restore point

create restore point BEFORE_TEST GUARANTEE FLASHBACK DATABASE;

— do something

# 关闭primary 重启到mount

SQL> shutdown immediate
SQL> startup mount

# 闪回primary database 到回restore point

SQL> flashback database to restore point BEFORE_TEST;

# 打开primary 数据库 open resetlogs

SQL> alter database open resetlogs;

# 停止standby

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

# 闪回standby database

SQL> select scn,NAME from v$restore_point; 
 SQL> FLASHBACK STANDBY DATABASE TO nnn;

# standby 继续应用日志

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Database altered

NOTE:
如果遇到下面的错误，需要重建standby controlfile
Sequence nnn does not yet exist in new incarnation, and it has been already applied in the old.

3, 19c new feature auto standby flash back

19c中引入新特性当在Oracle Data Guard配置中,对primary 数据库执行闪回或时间点恢复时对standby数据库也执行相同的操作;在早期版本中，我们必须在主数据库上获得RESETLOGS SCN＃，然后在备用数据库上手动发出FLASHBACK DATABASE命令，以启用托管恢复并继续重做应用过程。Oracle 19c的另一个新功能是，当我们在主数据库上创建一个还原点时，它将在备用数据库上也自动创建一个还原点。这个restore points叫做Replicated Restore Points 并且还原点名字后缀 “_PRIMARY”.

# on primary 
SQL> select flashback_on from v$database;
SQL> create restore point orcl_grp guarantee flashback database;

# on standby 
SQL> select name from v$restore_point;
NAME
--------------------------------------------------------------------------------
ORCL_GRP_PRIMARY
SQL> select NAME,REPLICATED from v$restore_point;

NAME			       REP
------------------------------ ---
ORCL_GRP_PRIMARY	       YES

— do some thing

# on primary
SQL> shutdown immediate
SQL> startup mount;
SQL> flashback database to restore point orcl_grp;
SQL> alter database open resetlogs;

# on standyb
当把数据库关闭启动到mount模式后，会看到mrp自动执行闪回数据库的操作。当alert 日志中显示“Flashback Media Recovery Complete” 后，就可以open read only standby, 继续应用日志了。

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Database altered

↧

Oracle19c使用USE_LARGE_PAGES可在LINUX平台的自动配置hugepage

November 4, 2020, 6:51 am

≫ Next: Oracle12c-19c如何防止安全检查查出弱密码?

≪ Previous: 19c Flashback Standby after Flashback (resetlogs) on Primary In Dataguard Environment

Hugepage是linux平台oracle数据库的建议配置，同样PostgreSQL等其它使用共享内存和多进程的系统都建议使用hugepage, 默认的4K配置带来的pagetables内存空间非常大。通常是修改LINUX内核参数sysctl.conf配置中hugepage大小和页个数，在oracle没有使用AMM时配置使用hugepage.在oracle数据库参数中与大页相关的参数为USE_LARGE_PAGES。在19c中可以使用auto_only值可以在OS未预先配置hugepage的情况下，oracle db实例启动时自动按需扩展linux kernel中分配hugepage.

USE_LARGE_PAGES参数引入从11.2.0.2, 当时的参数有:true(default),false,only.

true分别表示使用大页，但是预分配的hugepages不够用oracle实例在启动时完全不用，如果够用再使用；在11.2.0.3中，该行为已更改，以使Oracle现在可以在大页面中分配SGA的能力，如果用完了，它将使用小页面分配SGA的其余部分混合页；

false表示不使用大页。

only表示只使用大页，如果操作系统分配大页不足，数据库实例在启动时会失败。

同时在11.2.0.3 还引入了一个可能的 possible value，AUTO，是undocument parameter,同时也是不被oracle服务支持的。该参数在12.2可以使用，用于在linux系统中oracle自动的配置hugepage, 在19c It was desupported，虽然V$PARAMETER_VALID_VALUES中还保留，但是会提示ORA-27107

[oracle@oel7db1 ~]$ oerr ora 27107
27107, 0000, "AUTO value for USE_LARGE_PAGES parameter is no longer supported"
// *Cause: The USE_LARGE_PAGES configuration parameter was set to AUTO.
// *Action: Consult the alert file for details.

从19c开始引入了另一个值AUTO_ONLY, 看来Auto还是有些不知的原因被抛弃。 auto_only是auto+only的综合，表示自动扩展使用hugepage,并且只使用，如果无法分配hugepages，实例在启动时则会失败。在EXADATA中AUTO_ONLY为默认值，on-premise版本中true仍是默认值，不确定AUTO_ONLY的默认设置是否仅适用于Exadata，例如在我的Oracle 19.3非Exadata的VM中，默认值为“ TRUE”（意味着Oracle可以使用它获取的任何大页面+任何剩余内存作为小页面分配）。AUTO_ONLY的主要优点在于，我们不再需要重新计算设置操作系统所用的大页面总数，可以让数据库本身从可用内存中创建大页面，从而在大型系统上进行配置，尤其是进行快速编排容易得多

AUTO_ONLY

This setting is available starting with Oracle Database 19c and it is the default setting for Exadata systems. It specifies that, during startup, the instance will calculate and request the number of large pages it requires. If the operating system can fulfill this request, then the instance will start successfully. If the operating system cannot fulfill this request, then the instance will fail to start. This ensures that no instances will run with under-provisioned large pages.

Note:USE_LARGE_PAGES is set to FALSE automatically in an Oracle ASM instance when MEMORY_TARGET is enabled. In this case, the FALSE setting does not cause performance degradation.

下面开始测试, Oracle linux7 + Oracle 19.3 on-premise，
1, 不使用大页

[root@oel7db1 ~]# cat /proc/meminfo
MemTotal:        3765384 kB
MemFree:         3557852 kB
MemAvailable:    3623672 kB
Buffers:            2108 kB
Cached:           107400 kB
SwapCached:            0 kB
...
Shmem:              8704 kB
Slab:              28048 kB
SReclaimable:      15932 kB
SUnreclaim:        12116 kB
KernelStack:        2128 kB
PageTables:         4432 kB
...
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       65472 kB
DirectMap2M:     4128768 kB

SQL> startup mount
ORACLE instance started.

Total System Global Area 1073738888 bytes
Fixed Size                  9143432 bytes
Variable Size             792723456 bytes
Database Buffers          268435456 bytes
Redo Buffers                3436544 bytes
Database mounted.

SQL> show parameter large

PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- ----------------------------------------------------------------------------------------------------
large_pool_size                                              big integer 0
use_large_pages                                              string      FALSE

# db alert log
****************************************************
 /dev/shm will be used for creating SGA
Large pages will not be used. Only standard 4K pages will be used
****************************************************
**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
 Per process system memlock (soft) limit = 128G
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 1024M
 Available system pagesizes:
  4K, 2048K
 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
        4K       Configured          262150          262150        NONE
 Reason for not supporting certain system pagesizes:
  2048K - Dynamic allocate and free memory regions
**********************************************************************

2, AUTO_ONLY 但是AMM

SQL> @pvalid use_large_pages
Display valid values for multioption parameters matching "use_large_pages"...

  PAR# PARAMETER                                                 ORD VALUE                          DEFAULT
------ -------------------------------------------------- ---------- ------------------------------ -------
   167 use_large_pages                                             1 TRUE                           DEFAULT
       use_large_pages                                             2 AUTO
       use_large_pages                                             3 ONLY
       use_large_pages                                             4 FALSE
       use_large_pages                                             5 AUTO_ONLY

SQL> alter system set use_large_pages=AUTO_ONLY scope=spfile;

System altered.

SQL> shut abort
ORACLE instance shut down.

SQL> startup mount
ORA-27138: unable to allocate large pages with current parameter setting
Additional information: 9571

# ALERT LOG
Starting ORACLE instance (normal) (OS id: 2911)
****************************************************
 /dev/shm will be used for creating SGA
Large pages will not be used. Only standard 4K pages will be used
****************************************************
**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
 Per process system memlock (soft) limit = 128G
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 1024M
 Available system pagesizes:
  4K, 2048K
 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
        4K       Configured          262150          262150        NONE
 Reason for not supporting certain system pagesizes:
  2048K - Dynamic allocate and free memory regions
**********************************************************************

3，禁用AMM, auto_only

--使用pfile恢复上面参数use_large_pages为true
SQL> show parameter target

PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- ----------------------------------------------------------------------------------------------------
memory_max_target                                            big integer 1G
memory_target                                                big integer 1G
parallel_servers_target                                      integer     20
pga_aggregate_target                                         big integer 0
sga_target                                                   big integer 0
target_pdbs                                                  integer     2
SQL> alter system set memory_target=0;

System altered.

SQL> show parameter sga

PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- --------------------------------------
allow_group_access_to_sga                                    boolean     FALSE
lock_sga                                                     boolean     FALSE
pre_page_sga                                                 boolean     TRUE
sga_max_size                                                 big integer 668M
sga_min_size                                                 big integer 0
sga_target                                                   big integer 668M
unified_audit_sga_queue_size                                 integer     1048576

SQL> show parameter target

PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- --------------------------------------
memory_max_target                                            big integer 0
memory_target                                                big integer 0
parallel_servers_target                                      integer     20
pga_aggregate_target                                         big integer 356M
sga_target                                                   big integer 668M
target_pdbs                                                  integer     1

SQL> show parameter large

PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- ----------------------------------------------------------------------------------------------------
large_pool_size                                              big integer 0
use_large_pages                                              string      TRUE

SQL> alter system set use_large_pages=AUTO_ONLY scope=spfile;
System altered.

SQL> shut immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.

Total System Global Area  700445640 bytes
Fixed Size                  9139144 bytes
Variable Size             419430400 bytes
Database Buffers          268435456 bytes
Redo Buffers                3440640 bytes
Database mounted.
SQL>


# db alert log
2020-11-04 05:04:22.275000 -05:00
Starting ORACLE instance (normal) (OS id: 3567)
****************************************************
 Sys-V shared memory will be used for creating SGA
 ****************************************************
DISM started, OS id=3579
**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
 Per process system memlock (soft) limit = 128G
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 672M
 Available system pagesizes:
  4K, 2048K
 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
     2048K              336             336             336        NONE
 Reason for not supporting certain system pagesizes:
  4K - Large pagesizes only 
**********************************************************************

Note:
已自动使用上了HUGEPAGES, 并且大小刚好也是我们配置的sga大小，没有太多的浪费。

查看操作系统

$cat /proc/meminfo
MemTotal: 3765384 kB
MemFree: 2246868 kB
MemAvailable: 2721796 kB
Buffers: 2108 kB
Cached: 515388 kB
Shmem: 8800 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
 HugePages_Total: 336
 HugePages_Free: 3
 HugePages_Rsvd: 3
 HugePages_Surp: 0
 Hugepagesize: 2048 kB
DirectMap4k: 67520 kB
DirectMap2M: 4126720 kB

$sysctl -a|grep page
vm.hugepages_treat_as_movable = 0
vm.nr_hugepages = 336
vm.nr_hugepages_mempolicy = 336
vm.nr_overcommit_hugepages = 0
vm.page-cluster = 3

[oracle@oel7db1 dbs]$ cat /etc/sysctl.conf |grep page
[oracle@oel7db1 dbs]$

Note:
可见已修改了vm.nr_hugepages，配置了hugepage,但是并未改变 /etc/sysctl.conf的配置，重启操作系统hugepage将会释放或只按/etc/sysctl.conf指定的大小。那是哪个进程分配了内存呢？从上面的db alert log中可以看到OS PID 3579。

[oracle@oel7db1 ~]$ ps -ef|grep 3579|grep -v grep
root      3579     1  0 08:19 ?        00:00:00 ora_dism_anbob19c

[oracle@oel7db1 ~]$ ls -l `which oradism`
-rwsr-x--- 1 root oinstall 147848 Apr 17  2019 /u01/app/oracle/product/19.2.0/db_1/bin/oradism

Note:
实际是有oradism process扩展了linux kenel 的hugepage个数。该进程是root的属主。另外如果启动时linux 预分配的hugepages够实例使用，实例启动时也不会调用oradism，alert中不会看到该进程，该进程也不会随实例启动。

另外发现在当前的环境中使用strace 跟踪数据库启动过程，AUTO_ONLY是无法启动的。

[oracle@oel7db1 ~]$ strace -f -o output.txt sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Nov 4 08:19:02 2020
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> startup
ORA-27106: system pages not available to allocate memory
Additional information: 6122
Additional information: 2
Additional information: 3
SQL> exit

# alert log
Starting ORACLE instance (normal) (OS id: 1928)
****************************************************
 Sys-V shared memory will be used for creating SGA
 ****************************************************
WARNING: -------------------------------
WARNING: oradism did not start up correctly.
Return code: 16 errno 0 info1 54321 info2 65535
----------------------------------------
Oradism binary does not have root privilege.
Please verify if oradism has required privilege
Oradism spawned failed for large page allocation
ERROR: Failed to get available system pages to allocate memory
**********************************************************************
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
 Per process system memlock (soft) limit = 128G
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 672M
 Available system pagesizes:
  4K, 2048K
 Supported system pagesize(s):
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
     2048K                0             336               0   ORA-27102
 Reason for not supporting certain system pagesizes:
  4K - Large pagesizes only
RECOMMENDATION:
 1. Configure system with expected number of pages for every
 supported system pagesize prior to the next instance restart operation.
**********************************************************************
SGA: Realm creation failed

[oracle@oel7db1 ~]$ oerr ora 27106
27106, 00000, "system pages not available to allocate memory"
// *Cause: System page count for supported page sizes was misconfigured.
// *Action: Configure system page count as recommended in the alert file.


[oracle@oel7db1 ~]$  oerr ora 27102
27102, 00000, "out of memory"
// *Cause: Out of memory
// *Action: Consult the trace file for details

— over —

↧