Quantcast
Channel: ANBOB
Viewing all 705 articles
Browse latest View live

Troubleshooting ORA-600 issue related to memory curruted when using DBLINK

$
0
0

前段时间的一个案例,突然好几个数据库出现了ora-600 坏块相关的错误, 但是幸运的是使用rman, dbv, analyze table validate structure 都没有实际的坏块, 也就是说很可能只是出现在memroy 中,目标和源都是11.2.0.3.7 2nodes RAC, 最终是确认了为Procedure中使用了DBLINK触发,local db HPUX, remote db AIX。
# Remote database errors

ORA 600 [6101]   
ORA 600 [kdsgrp1]
ORA 600 [kdBlkCheckError]      
ORA 600 [17182]        
ORA 600 [kghfrempty:ds]        
ORA 600 [17114]               
ORA 600 [6856]         
ora-600 [18301]
ORA 600 [4000]

Error Stack: ORA-600[6101]
Main Stack:
kdxlin <- kco_issue_callback <- kcoapl <- kcbchg1_main <- kcbchg <- ktuapundo <- kdiulk
<- kcoubk <- ktundo <- kturCRBackoutOneChg <- ktrgcm <- ktrget3 <- ktrget2 <- kdifxs1 <- kdifxs
<- qerixtFetch <- qerpfRealFetch <- qerpfFetch <- qertbFetchByRowID <- qergiFetch
<- opifch2 <- opifch <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main 1>     

1< ***** Error Stack ***** 
 ORA-00600: internal error code, arguments: [6101], [17], [27], [1], [], [], [], [], [], [], [], []
1< ***** Error Stack *****
 Error 600 in redo application callback
 Dump of change vector:
 TYP:0 CLS:1 AFN:295 DBA:0x49c34feb OBJ:4210502 SCN:0x0ea4.9a21fec2 SEQ:0 OP:10.2 ENC:0 RBL:0
 index redo (kdxlin): insert leaf row
 KTB Redo
 op: 0x02 ver: 0x01
 compat bit: 4 (post-11) padding: 1
 op: C uba: 0x4ac03e52.1a95.1e
 UNDO: SINGLE split flag / CLEAR / -- / -- / --
 itl: 1, sno: 147, row size 27
 insert key: (23):
 07 78 76 0b 05 09 36 25 04 c3 14 01 02 02 c1 02 06 6c 4e a3 12 00 06
 Block after image is corrupt:
 buffer tsn: 29 rdba: 0x49c34feb (295/217067)
 scn: 0x0ea4.9a21fec2 seq: 0x00 flg: 0x00 tail: 0xfec20600
 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
 Hex dump of block: st=0, typ_found=1
 
2> ***** Current SQL Statement for this session (sql_id=9wz0u7aqusy24) *****
 SELECT ROWID,"CYCLEID","REGION","SUBSID","PARAID","REMINDTIME","TRANSFLAG","FORMNUM" FROM "ANBOB"."XXXX_DETAIL_LOG" "T" WHERE "TRA
 NSFLAG"=1 AND "REGION"=:1 AND ("CYCLEID"=:2 OR "CYCLEID"=:3) AND "REMINDTIME">TO_DATE(TO_CHAR(:4-1,'yyyymmdd'),'yyyymmdd') AND "FORMNUM" IS NO
 T NULL AND "PARAID"=:5 
	  


Error Stack: ORA-600[kdsgrp1]
Main Stack:
kdsgrp1_dump <- kdsgrp1 <- kdsgrp <- qetlbr <- qertbFetchByRowID <- qergiFetch <- opifch2
<- opifch <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real
<- ssthrdmain <- main 

=====================================================
Error: ORA-00600 [kghfrempty:ds]   ORA-00600  [17182]

Main Stack:

   kghnerror <-        kghfrempty <-
        kghgex <-        kghalf <-        kdbmal <-        kdxd4ckf <-        kdxdmp <-
        ktbtdu <-        ktbdbh <-        ktbdbhw <-        kcbtdu <-        kcbzdh <-
        kcbzsp <-        kssdmp1 <-        kssdmh <-        ksudmc <-        kssdmp1 <-
        kssdmh <-        ksudmp_proc <-        ksudmp <-        kssdmp <-        ksudps <-
        dbkedDefDump <-        ksedmp <-        ksfdmp <-        dbgexPhaseII <-        dbgexExplicitEndInc   <-
         <-        dbgeEndDDEInvocationImpl <-        dbgeEndDDEInvocation <-
        kgherror <-        kghfrf <-        kdbmfr <-        kdb4cpss <-        kdbcpss <-
        kdourp2 <-        kdourp <-        kco_issue_callback <-       kcoapl <-
        kcbchg1_main <-        kcbchg <-        ktuapundo <-        kdoiur <-        kcoubk <-
        ktundo <-        kturCRBackoutOneChg   <-         <-        ktrgcm <-        ktrget3 <-
        ktrget2 <-        kdsgrp <-        qetlbr <-        qertbFetchByRowID <-        +748                  <-
        qergiFetch <-        opifch2 <-        opifch <-        opiodr <-        ttcpip <-
        opitsk <-        opiino <-        opiodr <-        opidrv <-        sou2o <-
        opimai_real <-        ssthrdmain <-        main <- 1>     ***** Error Stack *****
       ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x11093B7E0], [], [], [], [], [], [], [], [], [], []
       ORA-00600: internal error code, arguments: [17182], [0x11093B7F0], [], [], [], [], [], [], [], [], [], []

1<     ***** Error Stack *****
        Total heap size    =318910464
        FREE LISTS:
         Bucket 0 size=88
          Chunk        11093fc58 sz=        0    kghdsx
         Bucket 1 size=280
          Chunk        11093fc88 sz=      744    free      "               "
         Bucket 2 size=1048
        Total free space   =      744
        UNPINNED RECREATABLE CHUNKS (lru first):
        PERMANENT CHUNKS:
          Chunk        11093fc38 sz=       80    perm      "perm           "  alo=80
        Permanent space    =       80
        ******************************************************
         Hla: 255
2< ***** End of Customized Incident Dump(s) ***** *** 2018-10-27 18:08:37.457 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x40) 2>      ***** Current SQL Statement for this session (sql_id=9wz0u7aqusy24) *****
        SELECT ROWID,"CYCLEID","REGION","SUBSID","PARAID","REMINDTIME","TRANSFLAG","FORMNUM" FROM "ANBOB"."XXXX_DETAIL_LOG" "T" WHERE "TRA
        NSFLAG"=1 AND "REGION"=:1 AND ("CYCLEID"=:2 OR "CYCLEID"=:3) AND "REMINDTIME">TO_DATE(TO_CHAR(:4-1,'yyyymmdd'),'yyyymmdd') AND "FORMNUM" IS NO
        T NULL AND "PARAID"=:5
2< ***** current_sql_statement ***** ========================================================== Error: ORA-00600 [17114] >     ***** Error Stack *****
       ORA-00600: internal error code, arguments: [17114], [0x11093B7C8], [], [], [], [], [], [], [], [], [], []
       ORA-00600: internal error code, arguments: [17182], [0x11093B7F0], [], [], [], [], [], [], [], [], [], []
1<     ***** Error Stack *****
      Error 600 in redo application callback
      Dump of change vector:
      TYP:0 CLS:1 AFN:245 DBA:0x3d408e55 OBJ:3877590 SCN:0x0e9f.ddf74211 SEQ:0 OP:11.5 ENC:0 RBL:0    <<<<<<<<<<< KTB Redo op: 0x02 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: C uba: 0x00848c5e.4f79.09 KDO Op code: URP row dependencies Disabled xtype: CR flags: 0x00000000 bdba: 0x3d408e55 hdba: 0xd685eac2 itli: 1 ispac: 0 maxfr: 9774 tabn: 0 slot: 49(0x31) flag: 0x2c lock: 0 ckix: 14 ncol: 67 nnew: 2 size: -15 col 47: [ 1] 80 col 52: *NULL* Block after image is corrupt: buffer tsn: 6 rdba: 0x80018001 (512/98305) scn: 0x8001.80018001 seq: 0x80 flg: 0x01 tail: 0x42110600 frmt: 0x02 chkval: 0x8001 type: 0x13=unknown Hex dump of corrupt header 4 = CORRUPT Dump of memory from 0x0700000338DCC000 to 0x0700000338DCC014 700000338DCC000 13023001 80018001 80018001 80018001 [..0.............] 700000338DCC010 80018001 [....] Hex dump of block: st=4, typ_found=0 Dump of memory from 0x0700000338DCC000 to 0x0700000338DD0000 SQL> @oid 3877590

owner                     object_name                    object_type        SUBOBJECT_NAME                 CREATED           LAST_DDL_TIME     status    DATA_OBJECT_ID
------------------------- ------------------------------ ------------------ ------------------------------ ----------------- ----------------- --------- --------------
ANBOB                   XXXX_DETAIL_LOG      TABLE PARTITION    PART_201810_312                20180919 23:53:26 20180919 23:53:26 VALID            3877590

==========================================================
Error: ORA-00600 [6856]

Error Stack: ORA-600[6856] [0], [28], [], [], [], [], [], [], [], [], []
Main Stack:
dbgePostErrorKGE    <-dbkePostKGE_kgsf <
kgeade <-kgeriv_int <-kgeriv <-kseipre <-ksesic2 <-
kdbmrd <-kdoqmd <-kco_issue_callback      <-kcoapl <-
kcbchg1_main <-kcbchg <-ktuapundo <-kdoiur <-kcoubk <-
ktundo <-kturCRBackoutOneChg   <-ktrgcm <-ktrget3 <-
ktrget2 <-kdsgrp <-qetlbr <-qertbFetchByRowID      <-
qergiFetch <-qergsFetch <-opifch2 <-kpoal8 <-opiodr <-
ttcpip <-opitsk <-opiino <-opiodr <-opidrv <-
sou2o <-opimai_real <-ssthrdmain <-main <-_
 
Error 607 in redo application callback
Dump of change vector:
TYP:0 CLS:1 AFN:874 DBA:0xda802610 OBJ:3640041 SCN:0x0ea1.d72a508b SEQ:0 OP:11.12 ENC:0 RBL:0   <<<<<<<<<<<<<
KTB Redo
op: 0x04  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: L  itl: xid:  0x0cfd.00d.00b6cac5 uba: 0xbdc44229.f110.08
                      flg: C---    lkc:  0     scn: 0x0ea1.d6ebe91f
KDO Op code: QMD row dependencies Disabled
  xtype: CR flags: 0x00000000  bdba: 0xda802610  hdba: 0xd948a145
itli: 2  ispac: 0  maxfr: 9774
tabn: 0 lock: 0 nrow: 2
slot[0]: 28
slot[1]: 29
Block after image is corrupt:
buffer tsn: 8 rdba: 0xda802610 (874/9744)
scn: 0x0ea1.d72a508b seq: 0x00 flg: 0x00 tail: 0x508b0600
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

...
Block header dump:  0xda802610
 Object id on Block? Y
 seg/obj: 0x378ae9  csc: 0xea1.d728b4b9  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0xda802540 ver: 0x01 opc: 0
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0cb6.035.00b3a91d  0xbdc0b840.fc03.1d  C---    0  scn 0x0ea1.d6ec07a9
0x02   0x0db6.028.00cc849a  0xbd830df9.dffe.0b  C---    0  scn 0x0ea1.d6ffe970
bdba: 0xda802610
data_block_dump,data header at 0x7000000e0d24064

----- Current SQL Statement for this session (sql_id=aatzumckd8p18) -----
select   count(1) from XXXX_DETAIL_LOG t^M
  where t.cycleid = to_number(to_char(sysdate,'yyyymm')) and t.transflag = 0     ^M
    and paraid in^M
    (^M
      '190037','190038','190039','190040','190041','190042',^M
      '190043','190044','190045','190046','190047','190048','190049','190050','190051'^M
    )
	
==========================================================
Error: ORA-00600 [4000]

Error Stack: ORA-600 [4000], [16220], [], [], [], [], [], [], [], [], [], []
1< ***** Error Stack ***** 1>     ***** Dump for incident 855241 (ORA 600 [4000]) *****

       *** 2018-10-26 18:06:30.359
       dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
2>      ***** Current SQL Statement for this session (sql_id=9wz0u7aqusy24) *****
        SELECT ROWID,"CYCLEID","REGION","SUBSID","PARAID","REMINDTIME","TRANSFLAG","FORMNUM" FROM "ANBOB"."XXXX_DETAIL_LOG" "T" WHERE "TRA
        NSFLAG"=1 AND "REGION"=:1 AND ("CYCLEID"=:2 OR "CYCLEID"=:3) AND "REMINDTIME">TO_DATE(TO_CHAR(:4-1,'yyyymmdd'),'yyyymmdd') AND "FORMNUM" IS NO
        T NULL AND "PARAID"=:5
2<      ***** current_sql_statement *****
	
	----- Call Stack Trace -----
skdstdst <-ksedst <-dbkedDefDump <-ksedmp <-ksfdmp <-
$cold_dbgexPhaseII <-)+576       <-dbgexProcessError <-+2096       <-dbgeExecuteForError  call     <-
 <-dbgePostErrorKGE <-2368    <-dbkePostKGE_kgsf <-128      kgeade <-kgesev 
 <-ksesec1 <-npierr <-kpnerr <-kpnpst <-upirtrc <-kpurcsc <-kpufch0 <-kpufch 
 <-OCIStmtFetch <-qerrmOFBu <-qerrmFBu <-qerrmFetch <-qerjotRowProc <-
   <-qersoFetch <-qerjotFetch <-opifch2 <-opifch <-
opipls <-opiodr <-rpidrus <-skgmstack <-rpidru <-
rpiswu2 <-rpidrv <-psddr0 <-psdnal <-pevm_BFTCHC <-
pfrinstr_BFTCHC   <-pfrrun_no_tool    <-pfrrun <-
plsql_run <-peidxr_run <-peidxexe <-kkxdexe <-kkxmpexe <-
kgmexwi <-kgmexec <-evapls <-evaopn2 <-kkxmexcs <-
opiexe <-kpoal8 <-opiodr <-ttcpip <-opitsk <-
opiino <-opiodr <-opidrv <-sou2o <-opimai_real <-
ssthrdmain <-main	

Note:
所有的错误都指向了同一张表。

# Local db errors

Errors in file /oracle/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_ora_3914.trc  (incident=3335114):
ORA-00600: internal error code, arguments: [ORA-00600: internal error code, arguments: [4000], [16220], [], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], [], [], []
ORA-02063: preceding line from LNK_ANBOB_C1
Incident details in: /oracle/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_3335114/weejar1_ora_3914_i3335114.trc	
	ORA-00600: internal error code, arguments: [ORA-00600: internal error code, arguments: [4000], [16220], [], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], [], [], []
ORA-02063: preceding line from LNK_ANBOB_C1

Incident 3335115 created, dump file: /oracle/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_3335115/weejar1_ora_3914_i3335115.trc
ORA-00700: soft internal error, arguments: [kgerev1], [600], [600], [700], [], [], [], [], [], [], [], []

Incident 3335116 created, dump file: /oracle/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_3335116/weejar1_ora_3914_i3335116.trc
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], [], [], [], [], []


Dump continued from file: /oracle/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_ora_3914.trc
ORA-00700: soft internal error, arguments: [kgerev1], [600], [600], [700], [], [], [], [], [], [], [], []

========= Dump for incident 3335115 (ORA 700 [kgerev1]) ========

*** 2018-10-26 18:07:24.668
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=c9hrbhcy0t0vt) -----
call ANBOB.LOC_KAFKA_TOPIC_NOTIFY_PRO(319)
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
c0000013d53e4c10        67  procedure ANBOB.LOC_KAFKA_TOPIC_NOTIFY_PRO

----- Call Stack Trace -----
skdstdst <-ksedst <-dbkedDefDump <-ksedmp <-ksfdmp <-
$cold_dbgexPhaseII <-)+576   <-dbgexProcessError  <-dbgeExecuteForError  call     <-
 <-dbgePostErrorKGE <-2368    <-dbkePostKGE_kgsf <-
kgeade <-kgesev <-ksesec1 <-npierr <-kpnerr <-
kpnpst <-upirtrc <-kpurcsc <-kpufch0 <-kpufch <-
OCIStmtFetch <-qerrmOFBu <-qerrmFBu <-qerrmFetch <-qerjotRowProc <-
<-qersoFetch <-qerjotFetch <-opifch2 <-opifch <-
opipls <-opiodr <-rpidrus <-skgmstack <-rpidru <-
rpiswu2 <-rpidrv <-psddr0 <-psdnal <-pevm_BFTCHC <-
pfrinstr_BFTCHC  <-pfrrun_no_tool <-pfrrun <-
plsql_run <-peidxr_run <-peidxexe <-kkxdexe <-kkxmpexe <-
kgmexwi <-kgmexec <-evapls <-evaopn2 <-kkxmexcs <-
opiexe <-kpoal8 <-opiodr <-ttcpip <-opitsk <-
opiino <-opiodr <-opidrv <-sou2o <-opimai_real <-
ssthrdmain <-main 


# 存储过程伪代码
procedure ANBOB.LOC_KAFKA_TOPIC_NOTIFY_PRO 
is
begin
 for i in(select xx,rowid from XXXX_DETAIL_LOG@LNK_ANBOB_C1 where xx) loop
     if xx 
        update XXXX_DETAIL_LOG@LNK_ANBOB_C1  t set xx where t.rowid = i.rowid;
     end if;
end;

对于上面的错误尝试flash buffer_cache, restart instance, recreate index 都未绕过错误。但是CTAS后短时没有再发生,不确认原因。 在目前的已知BUG中没有完全matched BUG.  不过与Bug 20368850 : ORA-600 [KDXLIN: SNO OUT OF RANGE]很像。

it may have changed many keys in index leaf block which needs to be read concurrently. So when there is insufficient space in the block to fit in the key to rollback for read. Oracle would compact the block and shrink the itl which it could use improper function to generate corruption.

Bug 20368850

PROBLEM DESCRIPTION:

Block corruption from kdxlin() during undo apply for CR.
.
In the failing case, the situation is as follows: an index leaf block has
some keys deleted by several transaction. The presence of multiple
transactions increases the itl size to 8 entries.
.
During cr, we rollback the transactions (7 in total). When we come to
process the last of these, there is insufficient space in the block
to fit in the key. This txn uses itl entry 4. We compact the block and
then shrink the itl in kdxlin() by calling ktbsit(). However, during cr
we do not maintain the itl entries the same as for regular undo. So in
ktbsit() we do not see itl entry 4 as active, we end up shrinking the
list right back to 2 entries.

After the space is recovered, the key is inserted and all is well. Then
at the end of kdxlin() we process the transaction layer undo via a call
to ktbair(). Because this is using the itl index frmo the redo – 4 in
our case – we end up changing itl entry 4 even though there are only 2
itl entries after the compact/shrink done earlier. The result is that
we end up corrupting the row index. Depending on the block layout and
degree of itl shrink, we can also corrupt the block header.

FIX DESCRIPTION:

I amended kdxlin() so that we now call ktbShrinkItlsToLimit() using
the itl number from the redo as a lower limit for itl shrinking. Before
this call I assert that the itl index from the redo is within the itl of
the block.

 

Summary:

最后确认了是一个新上的PROCEDURE, 使用了DBLINK, 并使用ROWID更新,不建议通常DBLINK做DML,  非关键业务最终是下线了该存储过程。 如果可以可以尝试不用rowid , 或把DML 的对象放到LOCAL 实例。


Troubleshooting 12C node2 CRS start fail with ORA-12547 and ORA-15077 in Flex ASM 案例

$
0
0

anbob flex asm

Flex ASM

在12c以前的版本数据库实例使用操作系统认证连接ASM实例,因为ASM CLIENT(DB INSTANCE)和ASM Server总是在同一个主机上, 从12c版本开始引入的FLEX ASM架构允许数据库实例可以和ASM运行在不同的主机中, 使用FLEX ASM user password文件认证, ASM 密码文件存储在ASM DISKGROUP中, 同时在创建Flex ASM时会默认创建ASM USER。Flex ASM 也支持oracle 12c前版本的rdbms, 同样也是建议使用的ASM架构

ASM Network

在Flex ASM Oracle 12c引入了一种新类型network, 叫做ASM network.  这种network用于ASM和ASM CLIENT及所有节点间通信。集群中的所有ASM client可以访问所有ASM network,    也可以只配置一个network共用于支持private network和asm network。

ASM Listeners

ASM listener用来支持Flex ASM 访问, 为每个ASM network配置一组ASM listener, 每个ASM client数据库实例中最多将三个ASM listener地址注册为remote listeners,所有客户端连接都在整个ASM实例集中进行负载平衡. 默认名为ASMNET1LSNR_ASM

案例

有了上面的基础认识,开始分析最近遇到的一个FLEX ASM相关的案例, 这是一套12c R2 2-nodes RAC  DG 环境 , 在检查DG standby side时,发现Standby node2 instance未启动,是在standby node1 上接收并应用。  尝试启动实例2时发现了问题。

root@anbobstb02:~$crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

root@anbobstb02:~$crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

grid@anbobstb02$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.crf
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       anbobstb02                  OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE anbobstb02                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.storage
      1        ONLINE  ONLINE       anbobstb02                  STABLE
--------------------------------------------------------------------------------

grid@anbobstb02:~$crsctl get cluster mode status
Cluster is running in "flex" mode

Note:
Flex ASM环境, 在启动NODE2 CRS时失败。

CRS alert log

2019-02-12 10:50:05.229 [OCSSD(51008)]CRS-1713: CSSD daemon is started in hub mode
2019-02-12 10:50:06.670000 +08:00
2019-02-12 10:50:06.670 [OCSSD(51008)]CRS-1707: Lease acquisition for node anbobstb02 number 2 completed
2019-02-12 10:50:07.756000 +08:00
2019-02-12 10:50:07.756 [OCSSD(51008)]CRS-1605: CSSD voting file is online: /dev/asm-disk55; details in /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ocssd.trc.
2019-02-12 10:50:07.759 [OCSSD(51008)]CRS-1605: CSSD voting file is online: /dev/asm-disk52; details in /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ocssd.trc.
2019-02-12 10:50:07.763 [OCSSD(51008)]CRS-1605: CSSD voting file is online: /dev/asm-disk51; details in /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ocssd.trc.
2019-02-12 10:50:14.376000 +08:00
2019-02-12 10:50:14.376 [OCSSD(51008)]CRS-1601: CSSD Reconfiguration complete. Active nodes are anbobstb01 anbobstb02 .
2019-02-12 10:50:17.177000 +08:00
2019-02-12 10:50:17.176 [OCTSSD(55374)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 55374
2019-02-12 10:50:17.193 [OCSSD(51008)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2019-02-12 10:50:18.157 [OCTSSD(55374)]CRS-2403: The Cluster Time Synchronization Service on host anbobstb02 is in observer mode.
2019-02-12 10:50:19.266000 +08:00
2019-02-12 10:50:19.266 [OCTSSD(55374)]CRS-2407: The new Cluster Time Synchronization Service reference node is host anbobstb01.
2019-02-12 10:50:19.266 [OCTSSD(55374)]CRS-2401: The Cluster Time Synchronization Service started on host anbobstb02.
2019-02-12 10:50:35.725000 +08:00
2019-02-12 10:50:35.725 [ORAROOTAGENT(50588)]CRS-5019: All OCR locations are on ASM disk groups [OCRDG], 
and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/oracle/app/grid/diag/crs/anbobstb02/crs/trace/ohasd_orarootagent_root.trc".

Trace ohasd_orarootagent_root.trc
adrci> show trace /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ohasd_orarootagent_root.trc

2019-02-12 10:50:25.765 : AGFW:2530133760: {0:5:3} Agent sending reply for: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:403
2019-02-12 10:50:25.765 : USRTHRD:2519627520: {0:5:3} Check: 0-1
2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} ora.cluster_interconnect.haip 1 1 state changed from: STARTING to: ONLINE
2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} RECYCLE_AGENT attribute not found
2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} Started implicit monitor for [ora.cluster_interconnect.haip 1 1] interval=30000 delay=3000
0
2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} Agent sending last reply for: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:40
3
2019-02-12 10:50:25.768 : USRTHRD:2512111360: {0:5:3} got pubgrpdata, 1-8-2-2-2
2019-02-12 10:50:25.770 : USRTHRD:2512111360: {0:5:3} Completed 1 HAIP assignment, start complete
2019-02-12 10:50:25.770 : USRTHRD:2512111360: {0:5:3} to verify inf event
2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} Agent received the message: RESOURCE_START[ora.storage 1 1] ID 4098:438
2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} Preparing START command for: ora.storage 1 1
2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} ora.storage 1 1 state changed from: OFFLINE to: STARTING
2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} RECYCLE_AGENT attribute not found
2019-02-12 10:50:25.813 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] (:CLSN00107:) clsn_agent::start {
2019-02-12 10:50:25.814 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::init NodeRole = 1
2019-02-12 10:50:25.814 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::check NODEROLE_HUB getOCRdetails
2019-02-12 10:50:25.832 : default:2519627520: clsvactversion:4: Retrieving Active Version from local storage.
2019-02-12 10:50:25.840 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
2019-02-12 10:50:25.840 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid
Attribute (5) ] failure for obj 0x7f8664436460 [0000000000000955] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone
0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, u
srFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0
2019-02-12 10:50:25.855 : CLSNS:2519627520: clsns_SetTraceLevel:trace level set to 1.
2019-02-12 10:50:25.859 : default:2519627520: Inited LSF context: 0x7f866453c5d0
2019-02-12 10:50:25.863 : CLSCRED:2519627520: clsCredCommonInit: Inited singleton credctx.
2019-02-12 10:50:25.863 : CLSCRED:2519627520: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
2019-02-12 10:50:25.886 : USRTHRD:2519627520: {0:5:3} 8154 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS
2019-02-12 10:50:25.889 : USRTHRD:2519627520: {0:5:3} 8154 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS
2019-02-12 10:50:25.924 : CLSCRED:2519627520: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.bb15e951dcbc4fc2ff3aec3bfe1f0424.root not found
2019-02-12 10:50:25.924 : USRTHRD:2519627520: {0:5:3} 7872 Error 4 opening dom root in 0x7f866441d590
2019-02-12 10:50:25.929 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
2019-02-12 10:50:25.929 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid Attribute (5) ] failure for obj 0x7f86647145d0 [0000000000000fc1] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, usrFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0
2019-02-12 10:50:26.014 : AGFW:2743070784: Recvd request to shed the threads
2019-02-12 10:50:26.014 :CLSFRAME:2743070784: TM [MultiThread] is changing desired thread # to 8. Current # is 9
2019-02-12 10:50:26.014 :CLSFRAME:2532235008: {0:1:5} Worker thread is exiting in TM [MultiThread] to meet the desired count of 8. New count is 8



2019-02-12 10:50:29.219 : USRTHRD:2519627520: {0:5:3} 7872 Error 4 opening dom root in 0x7f866465b680
2019-02-12 10:50:29.222 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
2019-02-12 10:50:29.222 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid
Attribute (5) ] failure for obj 0x7f8664546cd0 [0000000000002068] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone
0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, usrFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0

2019-02-12 10:50:34.664 : USRTHRD:2519627520: {0:5:3} 7872 Error 4 opening dom root in 0x7f86645abf50
2019-02-12 10:50:34.668 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
2019-02-12 10:50:34.668 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid Attribute (5) ] failure for obj 0x7f8664707b20 [000000000000392d] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, usrFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0
2019-02-12 10:50:35.715 : USRTHRD:2509133568: HAIP: event GIPCD_METRIC_UPDATE
2019-02-12 10:50:35.715 : USRTHRD:2512111360: {0:5:3} to verify inf event
2019-02-12 10:50:35.724 : default:2519627520: clsCredDomClose: Credctx deleted 0x7f866443f840
2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} — trace dump on error exit —
2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:3115
2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} ORA-12547: TNS:lost contact
ORA-12547: TNS:lost contact
ORA-15077: could not locate ASM instance serving a required diskgroup

2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} Category: 7
2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} DepInfo: 12547
2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} — trace dump end —
2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::parsekgforetcodes retcode = 7, kgfoCheckMount(OCRDG), flag 2
2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] (null) category: 7, operation: kgfoAl06, loc: kgfokge, OS error: 1254
7, other: ORA-12547: TNS:lost contact
ORA-12547: TNS:lost contact
ORA-15077: could not locate ASM instance serving a required diskgroup
2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::check kgfo returncode 1
2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] (:CLSN00140:)StorageAgent::parsekgforretcodes OCR dgName OCRDG state 1

Note:
从日志看应该是在CRS启动时没有发现ASM DISKGROUP,  asm 启动时在取asm 认证证数时出错,提示是ora-12547和ora-15055, Flex ASM中ASM server启动时要连接所有asm network.  下一步检查NODE1 的ASM listener.

grid@anbobstb01:/home/grid> crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCHDG.dg
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.DATADG.dg
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.MGMT.dg
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.OCRDG.dg
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.chad
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.net1.network
               ONLINE  ONLINE       anbobstb01                  STABLE
ora.ons
               ONLINE  ONLINE       anbobstb01                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       anbobstb01                  169.254.143.82 192.1
                                                             68.43.33,STABLE
ora.asm
      1        ONLINE  ONLINE       anbobstb01                  Started,STABLE
      2        ONLINE  OFFLINE                               STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.cvu
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       anbobstb01                  Open,STABLE
ora.anbobstb01.vip
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.anbobstb02.vip
      1        ONLINE  INTERMEDIATE anbobstb01                  FAILED OVER,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.rptstby.db
      1        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        ONLINE  ONLINE       anbobstb01                  Open,Readonly,HOME=/
                                                             oracle/app/oracle/pr
                                                             oduct/12.2.0/db_1,ST
                                                             ABLE
ora.scan1.vip
      1        ONLINE  ONLINE       anbobstb01                  STABLE
--------------------------------------------------------------------------------
grid@anbobstb01:/home/grid> crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       anbobstb01                  Started,STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.crf
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.crsd
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.cssd
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       anbobstb01                  OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.evmd
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       anbobstb01                  STABLE
ora.storage
      1        ONLINE  ONLINE       anbobstb01                  STABLE
--------------------------------------------------------------------------------

grid@anbobstb01:/home/grid> ocrdump /tmp/ocr.dmp
PROT-310: Not all keys were dumped due to permissions.
grid@anbobstb01:/home/grid> vi /tmp/ocr.dmp

[SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001]
ORATEXT : bb15e951dcbc4fc2ff3aec3bfe1f0424:grid   --credentials is exist
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_NONE, USER_NAME : grid, GROUP_NAME : oinstall}

grid@anbobstb01:~$oifcfg getif
bond0  133.96.43.0  global  public
bond1  192.168.43.0  global  cluster_interconnect,asm

grid@anbobstb01:/oracle/app/12.2.0/grid/bin> ps -ef|grep lsnr
grid      1411     1  0  2018 ?        00:01:53 /oracle/app/12.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
grid      6585     1  0  2018 ?        00:00:21 /oracle/app/12.2.0/grid/bin/tnslsnr listener_dg -inherit
grid     25826 19172  0 10:57 pts/3    00:00:00 grep --color=auto lsnr
grid     72783     1  0  2018 ?        00:02:01 /oracle/app/12.2.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
grid     78242     1  0 Feb12 ?        00:00:05 /oracle/app/12.2.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid     80521     1  0 Feb12 ?        00:00:35 /oracle/app/12.2.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit

grid@anbobstb01:/oracle/app/12.2.0/grid/bin> lsnrctl status ASMNET1LSNR_ASM
LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 13-FEB-2019 10:57:43
Copyright (c) 1991, 2016, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
STATUS of the LISTENER
------------------------
Alias                     ASMNET1LSNR_ASM
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                12-FEB-2019 18:21:02
Uptime                    0 days 16 hr. 36 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/app/12.2.0/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.43.33)(PORT=1526)))
The listener supports no services
The command completed successfully

grid@anbobstb01:/oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/trace> vi asmnet1lsnr_asm.log

2019-02-13T10:57:16.203184+08:00
Incoming connection from 192.168.43.34 rejected
13-FEB-2019 10:57:16 * 12546
TNS-12546: TNS:permission denied
 TNS-12560: TNS:protocol adapter error
  TNS-00516: Permission denied
  

grid@anbobstb01:/oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/trace> telnet 192.168.43.33 1526
Trying 192.168.43.33...
Connected to 192.168.43.33.
Escape character is '^]'.
Connection closed by foreign host.


Note:
实例1上看当前使用是没问题的, 但是上面运行的ASM listener没有服务, 使用telnet 发现很快会被拒绝, 检查iptables 没有限制,使用tcpdump 发现是监听进程发出的reset package. 如果当前的ASM Listener没有服务,那么Flex ASM 集群间就没有办法通信。跟监听连接相关的限制可能是sqlnet.ora.

grid@anbobstb01:/oracle/app/12.2.0/grid/network/admin> vi sqlnet.ora
NAMES.DIRECTORY_PATH= (TNSNAMES,EZCONNECT)

ADR_BASE = /oracle/app/grid
TCP.VALIDNODE_CHECKING=yes
TCP.INVITED_NODES=(...)

Note:
发现果然有sqlnet.ora中配置白名单,但是sqlnet.ora文件是从primary database复制过来的, 而primary和standby的Private network(ASM network) 不是一个子网段,所以standby side的白名单中并没有ASM network, 而没有服务。

解决方法

解决起来就简单了,在sqlnet.ora中增加ASM network的网段值。 这里提醒下,以后增长监听白名单,记的除了前端应用IP,还要加PUBLIC NETWORK, PRIVATE NETWORK, SCAN IP, ASM NETWORK..

浅谈Oracle Database 19c

$
0
0

Oracle Database 19c是大多数客户将其升级目标定位的版本,Oracle已将稳定性作为此版本的核心目标。 在Oracle Database 19c中,开发人员专注于修复已知问题,而不是添加新功能。 这导致了数百人年的测试和数千台服务器每天24小时运行测试。这种对稳定性的关注远不止核心数据库功能; 它还涵盖了从安装程序到组成产品的实用程序和工具的技术堆栈的所有方面。 这种方法加上我们对修补过程所做的更改将大大减轻我们客户未来几年的修补负担。

在我们讨论Oracle Database 19c中的一些更改之前,重要的是要记住Oracle数据库在过去40年中一直是企业系统的基石。 在此期间,我们在客户群的指导下添加了众多功能; 其中许多功能及其实现都是业界领先的,并且在许多情况下仍然是Oracle数据库的独特之处。

当数据无法访问时,数据对企业用户没什么价值,Oracle数据库确保它始终可以访问。确保数据库在意外服务器中断后重新启动时保持一致是如此的简单。 或者通过提供灾难恢复,Oracle数据库可以提供远距离数据的同步(或异步)复制,同时使其可用于报告和备份。任何服务器中断可能产生严重影响的任务关键系统中, 几乎都能发现Oracle Real Application Clusters(RAC)数据库的存在。 RAC使客户能够将Oracle数据库扩展到非常高的吞吐量和并发性,而无需更改其应用程序。

Oracle数据库被广泛认为是业界最安全的数据存储库之一。 没有其他数据库解决方案具有强大的功能或实施深度。 无论是我们实现的简单访问控制还是将数据类型到row level。 Oracle在数据的整个生命周期中可以对数据进行加密,以确保最大限度地减少恶意访问。

在Oracle数据库19c中虽然稳定性是Oracle Database 19c的重点,但并不是说没有一些值得一提的新功能和增强功能,例如:

Automatic Indexing

自动索引目前行业没有相关经验,优化数据库性能对许多客户来说都是一个挑战。 确定表中的哪些列需要索引不仅有益于单个查询,而且可能有数千种变体需要深入了解Oracle数据库的数据模型,性能相关功能以及底层硬件。 在Oracle Database 19c中,我们引入了自动索引,它不断评估执行的SQL和基础表,以确定要创建哪些索引以及可能删除哪些索引。 它通过专家系统完成此任务,该系统验证索引可能做出的改进,并在创建之后验证所做的假设。 然后它使用强化学习来确保它不再犯同样的错误。 最重要的是,随着数据模型和访问路径的变化,Oracle Database 19c能够随着时间的推移进行调整。

Active Standby DML

Active Standby DML重定向Active Data Guard的一个流行功能是它能够利用备用数据库进行报告和备份。 使用基本Data Guard,备用数据库可以持续恢复从主数据库发送的重做信息。 虽然通过Active Data Guard是充分利用企业资源的重大改进,提高了数据库的“备用”能力,但许多报告应用程序需要能够持久保存某些数据,例如记录信息以用于审计目的。 在Oracle Database 19c中,我们现在允许用户将此类写请求发送到备用数据库。 然后将这些写入透明地重定向到主数据库并首先写入(以确保一致性),然后将更改发送回备用数据库。 此方法允许应用程序将备用数据库用于中等写入工作负载,而无需更改任何应用程序。

Hybrid Partitioned

混合分区表将较大的表分成较小的块或分区使它们更易于管理,并且可以通过仅将操作集中在它们适用的数据上来提高性能。 Oracle数据库支持多种模型,用于分区数据以及用于分区管理的在线操作。但是,随着企业数据的规模和复杂性不断增加,监管要求要求它始终保持在线状态,我们需要查看管理它的新模型。使用混合分区表,DBA现在可以像以前一样将数据分成可管理的分区,但DBA现在可以选择应该在数据库中保留哪些分区以进行快速查询和更新,以及哪些分区可以只读并存储在外部分区中。这些外部分区可以在标准文件系统内部或低成本HDFS上保留。 DBA还可以选择将数据放在基于云的对象库中,从而将表“扩展”到云端。

JSON

JSON增强功能Oracle Database 19c中对JSON支持进行了大量增量增强,从简化SQL功能到部分更新JSON文档。

Memoptimized Rowstore

Memoptimized Rowstore此功能可以从物联网(IoT)等应用程序快速将数据插入Oracle Database 19c,这些应用程序以最少的事务开销摄取小型高容量事务。 使用快速摄取功能的插入操作会临时缓冲大型池中的数据,然后以延迟的异步方式批量写入磁盘。 实现该功能是从SGA中申请了新的内存子池叫做Memoptimized Pool, 该子池又分为两个部分 Memoptimize Buffer Area(75%)Hash Index(25%),, 后期再展开详细介绍。

Memoptimized Rowstore提供以下功能:
Fast ingest
快速摄取优化了高频率,单行数据插入数据库的处理。 Fast ingest使用large pool在将插件写入磁盘之前缓冲插件,以提高数据插入性能。

Fast lookup
快速查找可以快速从数据库中检索数据,以进行高频率查询。 快速查找在SGA中使用称为memoptimize池的单独内存区域来缓冲从表中查询的数据,从而提高查询性能,该功能在18c时引入。

Quarantine SQL Statements

Quarantine SQL Statements由于过度消耗处理器和I / O资源而由资源管理器终止的失控SQL语句,现在可以自动隔离。这可以防止这些失控的SQL语句再次执行,从而保护Oracle Database 19c免受性能下降的影响。

Real Time Statistics

实时统计(Real Time Statistics)现在查询优化器需要对表中数据的结构和组成进行详细统计,以使它们能够就如何执行复杂查询做出“最佳”决策。这样做的问题是统计收集可能是资源密集型的并且需要一段时间。对于最近的“always on”应用程序,找到运行批处理以收集此数据的窗口很困难。在Oracle Database 19c中,现在可以收集统计信息,可以像insert\delete\update数据一样实时收集统计信息。现在客户无需在优化程序所依赖的统计数据质量和为统计信息维护找到合适的时间之间做出妥协。

小结

综上所述,可以看到oracle 19c把稳定性放到了首位,解决了之前12c, 18c引入的数百个新特性存在的已知问题,是客户升级12c系列的理想版本(如果计划升级12c 18c不妨直接考虑19c),  在上个月我wechat上也有说过18C RU5 是修复了1050个已知BUG, 如果运行在之前的版本可以想象一下,很容易遇到“小惊喜”。 别外19c 同样也引入了几个新特性,尤其是在自治方面,如自动索引创建和自动问题SQL隔离, 还有在传统数据库上对于物理网IoT上所做的努力Memoptimized Rowstore。 同样如果你有安装19c 经验发现已经简便了许多,如root.sh , netca已集成。 关注我的BLOG ,后期会继续深入展开各特性。

安装SINGLE INSTTANCE 19C

$ mkdir /media/disk
$ mount /dev/cdrom /media/disk
$ cd /etc/yum.repos.d

$ mv public-yum-ol7.repo public-yum-ol7.repo_bak

$ vi public-yum-ol7.repo

[oel7]
name = Enterprise Linux 7.3 DVD
baseurl=file:///media/disk/
gpgcheck=0
enabled=1

$ yum install oracle-database-server-12cR2-preinstall-1.0-2.el7.x86_64.rpm

$ vi /etc/selinux/config
SELINUX=disabled

mkdir -p /u01/app/oracle/product/19.2.0/db_1
chown -R oracle:oinstall /u01
chmod -R 775 /u01

$ vi /etc/security/limits.conf

oracle soft nofile 1024
oracle hard nofile 65536
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft stack 10240
oracle hard stack 32768

$ passwd oracle

$ su - oracle
$ id oracle
uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

$ vi .bash_profile

export TMP=/tmp
export TMPDIR=$TMP
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/19.2.0/db_1
export ORACLE_SID=anbob19c
export PATH=/usr/sbin:$PATH
export PATH=$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
export CLASSPATH=$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib

# upload file to /u01/app/oracle/product/19.2.0/db_1

$ cd /u01/app/oracle/product/19.2.0/db_1
$ unzip V981623-01 oracle 19.2.zip

$ ls
addnode crs dbjava dmu hs jdbc md olap ords plsql rdbms sdk sqlplus utl
apex css dbs drdaas install jdk mgw OPatch oss precomp relnotes slax srvm V981623-01 oracle 19.2.zip
assistants ctx deinstall dv instantclient jlib network opmn oui QOpatch root.sh sqldeveloper suptools wwg
bin cv demo env.ora inventory ldap nls oracore owm R runInstaller sqlj ucp xdk
clone data diagnostics has javavm lib odbc ord perl racg schagent.conf sqlpatch usm

$ export DISPLAY=192.168.56.1:0
$ ./runInstaller
Launching Oracle Database Setup Wizard..

-- software only

$ dbca -initParams "_exadata_feature_on=true"

SQL> @i

USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR
-------------------- -------------------- ------------------------- ----- -------- ---------- -------- ---------- ----- --------------- ---------------- ----------------
SYS anbob19c localhost.localdomain 390 56967 19.0.0.0.0 20190220 2529 28 2453 0000000067081028 0000000067CFFEC8

SQL> select * from dba_auto_index_config;

PARAMETER_NAME PARAMETER_VALUE LAST_MODIFIED MODIFIED_BY
---------------------------------------- ------------------------------ ------------------------------ ------------------------------
AUTO_INDEX_DEFAULT_TABLESPACE
AUTO_INDEX_MODE OFF
AUTO_INDEX_REPORT_RETENTION 31
AUTO_INDEX_RETENTION_FOR_AUTO 373
AUTO_INDEX_RETENTION_FOR_MANUAL
AUTO_INDEX_SCHEMA
AUTO_INDEX_SPACE_BUDGET 50

7 rows selected.

Refrence https://blogs.oracle.com/oracle-database/oracle-database-19c-now-available-on-oracle-exadata

Oracle19c新特性: 自动索引(Automatic indexing)

$
0
0

在上一篇浅谈Oracle Database 19c中记录了Oracle Database 19c中引入了自动索引,它不断评估执行的SQL和基础表,以确定要创建哪些索引以及可能删除哪些索引。 它通过专家系统完成此任务,而且是一位7*24小时一直在工作的“专家”。

如何工作

Automatic index是有索引管理后台进程TASK调用, 可以自动的create, rebuild , drop 索引。后台进程是每15分钟调用一次,(是有j001进程执行_AUTO_INDEX_TASK_INTERVAL参数控制15分钟)。也是基于传统手动优化SQL的思路,基于SQL中的列使用识别可以创建的索引,然后验证自动索引对性能的影响,然后按预设的值去创建索引,只不过整个过程是自动的,并且整个过程都有审核报告。

1. 捕捉Capture
定期的捕获应用程序SQL历史进SQL仓库,包括SQL的文本、执行计划、绑定变量,执行统计信息等。
2. 视别后选索引Identify Candidates
识别有益于新SQL的后选索引,创建这个只有元数据的后选索引unusable\invisible index,  删除新创建的obsoloted索引。
3. 验证Verify
验证优化器对于后期捕捉的SQL是否会使用新创建的索引, 如果这个索引可以提升SQL的性能,就会物化该索引,所有的验证都是在应用程序工作流的外部完成。
4,决策Decide
如果该索引可以提升所有SQL的性能,会把该自动索引更改为visible, 如果该索引所有SQL性能更差,该索引会保持invisible, 如果该索引只部分SQL性能更差,该索引会更改为visible, 但是对于性能变差的SQL还是不可用。
5. 在线确认Online Validation
为其它SQL在线确认新索引的使用情况,开始是只允许一个会话使用一个SQL命令使用该索引,这样出问题也不会是大面积。
6. 监控Monitor
对于自动索引提供连续不断的监控,对于自动创建的索引而长时间不使用的会自动删除。多久后删除都可以配置。

自动索引适用于开发、测试、生产的所有阶段, 据去年的OOW上介绍自动索引会支持:单列索引,多列索引,函数索引,压缩索引(Compression Advanced Low),但目前19C 在线官方文档上只有local bTree index, 支持分区表和非分区表,临时表除外,也许后面的版本会跟上。自动索引会消耗一定的CPU、内存和存储, RM限制了该任务只能使用一个CPU,可以指定自动索引存放的表空间和使用表空间容量的比例,有些宣传资料上自动索引也是可以指定独立的TEMP表空间(AUTO_INDEX_TEMP_TABLESPACE),但目前的版本还无法修改。

其实如果看自动索引的创建流程,正式之前ORACLE不同版本逐渐引入的关于索引的特性的集合。如下:

create index ANBOB.SYS_AI_XXXXXX ON ANBOB.T1(ID) TABLESPACE USERS UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED LOW ONLINE;  

注:自动创建的索引是以SYS_AI开头,之前有整理过一结SYS开头样式的列笔记,look here

相关视图

下面是19c中引入的关于自动化索引相关的视图或列。

DBA_AUTO_INDEX_CONFIG                           –描述当前自动索引的配置
DBA_INDEXES/ALL_INDEXES/USER_INDEXES   –新增加的AUTO列标识是自动索引(YES)还是手动索引(NO)   DBA_AUTO_INDEX_EXECUTIONS                   –显示自动索引任务执行历史
DBA_AUTO_INDEX_STATISTICS                    –显示与自动索引相关的统计信息
DBA_AUTO_INDEX_IND_ACTIONS                 –显示在自动索引上执行的操作
DBA_AUTO_INDEX_SQL_ACTIONS                 –显示在SQL上执行的验证自动索引的操作
DBA_AUTO_INDEX_VERIFICATIONS             — 列出自动索引的PLAN_HASH_VALUE,和BUFFER_GETS对比等

相关配置

配置可以在CDB级修改,也可以在PDB级修改。

SQL> @i

USERNAME             INST_NAME            HOST_NAME                 SID   SERIAL#  VERSION    STARTED  SPID       OPID  CPID            SADDR            PADDR
-------------------- -------------------- ------------------------- ----- -------- ---------- -------- ---------- ----- --------------- ---------------- ----------------
SYS                  anbob19c             localhost.localdomain     390   56967    19.0.0.0.0 20190220 2529       28    2453            0000000067081028 0000000067CFFEC8

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE 

SQL> alter session set container=pdb1;
Session altered.

SQL> select * from dba_auto_index_config;

PARAMETER_NAME                           PARAMETER_VALUE                LAST_MODIFIED                  MODIFIED_BY
---------------------------------------- ------------------------------ ------------------------------ ------------------------------
AUTO_INDEX_DEFAULT_TABLESPACE
AUTO_INDEX_MODE                          OFF
AUTO_INDEX_REPORT_RETENTION              31
AUTO_INDEX_RETENTION_FOR_AUTO            373
AUTO_INDEX_RETENTION_FOR_MANUAL
AUTO_INDEX_SCHEMA
AUTO_INDEX_SPACE_BUDGET                  50

7 rows selected.

-- check explan
Execution Plan
----------------------------------------------------------
Plan hash value: 4100957058

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |    14 |   714 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| SMB$CONFIG        |    14 |   714 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | I_SMB$CONFIG_PKEY |    14 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PARAMETER_NAME" LIKE 'AUTO_INDEX%')
       filter("PARAMETER_NAME" LIKE 'AUTO_INDEX%')
Note
-----
   - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)

Note:
查询dba_auto_index_config和cdb_auto_index_config可以查看当前automatic index特性的配置,该视图数据是源于SMB$CONFIG基表,可以使用DBMS_AUTO_INDEX.CONFIGURE修改相应的配置。如:
AUTO_INDEX_DEFAULT_TABLESPACE   –指定自动索引创建所存储的表空间, 缺省使用数据库的默认表空间;
AUTO_INDEX_MODE      –指定自动索引的模式(开关),当前3个值,默认OFF,表示特性关闭;IMPLEMENT表示自动创建创建、测试、并报告,最终索引是visible状态; REPORT ONLY 会创建索引但是invisible,不会影响SQL,只是意图生成报告。
AUTO_INDEX_REPORT_RETENTION    –自动索引报告历史保留的天数 默认31天
AUTO_INDEX_RETENTION_FOR_AUTO     — 自动创建的索引从上次使用后多少天不再使用的索引可以删除 默认为373天
AUTO_INDEX_RETENTION_FOR_MANUAL     — 手动创建的索引从上次使用后多少天不再使用的索引可以删除; 默认永远
AUTO_INDEX_SCHEMA
AUTO_INDEX_SPACE_BUDGET    — 自动索引可以使用表空间大小的百分比,默认 50%

如何trace Automatic Index

从上面查询自动索引配置的视图可以看到是基于SMB$CONFIG表的,其它该表中还有很多参数是隐藏的,其中就包括了自动索引调度的间隔时间和trace和一些资源限制参数。 🤓
SQL>  select * from  SMB$CONFIG;

PARAMETER_NAME                           PARAMETER_VALUE LAST_UPDATED                        UPDATED_BY PARAMETER_DATA
---------------------------------------- --------------- ----------------------------------- ---------- ----------------------------------------
SPACE_BUDGET_PERCENT                                  10
PLAN_RETENTION_WEEKS                                  53
SPM_TRACING                                            0
AUTO_CAPTURE_PARSING_SCHEMA_NAME                       0                                                
AUTO_CAPTURE_MODULE                                    0                                                
AUTO_CAPTURE_ACTION                                    0                                                
AUTO_CAPTURE_SQL_TEXT                                  0                                                
AUTO_INDEX_SCHEMA                                      0                                                
AUTO_INDEX_DEFAULT_TABLESPACE                          0
AUTO_INDEX_SPACE_BUDGET                               50
AUTO_INDEX_REPORT_RETENTION                           31
AUTO_INDEX_RETENTION_FOR_AUTO                          0                                                373
AUTO_INDEX_RETENTION_FOR_MANUAL                        0
AUTO_INDEX_MODE                                        0 24-FEB-19 12.24.02.000000 AM        SYS        IMPLEMENT
_AUTO_INDEX_TRACE                                      0
_AUTO_INDEX_TASK_INTERVAL                            900
_AUTO_INDEX_TASK_MAX_RUNTIME                        3600
_AUTO_INDEX_IMPROVEMENT_THRESHOLD                     20
_AUTO_INDEX_REGRESSION_THRESHOLD                      10
_AUTO_INDEX_ABSDIFF_THRESHOLD                        100
_AUTO_INDEX_STS_CAPTURE_TASK                           0 24-FEB-19 12.24.02.000000 AM        SYS        ON
_AUTO_INDEX_CONTROL                                    0
_AUTO_INDEX_DERIVE_STATISTICS                          0                                                ON
_AUTO_INDEX_CONCURRENCY                                1
_AUTO_INDEX_SPA_CONCURRENCY                            1
_AUTO_INDEX_REBUILD_TIME_LIMIT                        30
_AUTO_INDEX_REBUILD_COUNT_LIMIT                        5
AUTO_SPM_EVOLVE_TASK                                   0                                                OFF
AUTO_SPM_EVOLVE_TASK_INTERVAL                       3600
AUTO_SPM_EVOLVE_TASK_MAX_RUNTIME                    1800


跟踪的方法
update SMB$CONFIG set   _AUTO_INDEX_TRACE=2
Or
call dbms_auto_index_internal.configure(‘_AUTO_INDEX_TRACE’,2,true,true)
关闭TRACE,替换2为0
然后去查找日志文件即可
grep ^AI $ORACLE_BASE/diag/rdbms/*/$ORACLE_SID/trace/$ORACLE_SID_*trc
当然也可以从DBA_AUTO_INDEX_IND_ACTIONS 查看索引操作过程。

功能演示

测试这个功能1种是等15分钟,后台进程捕捉后创建;另一种可以使用hint /*+ USE_AUTO_INDEXES */, 这是在2018OOW上有介绍,但这种方法目前发布的版本里也是不可用,等待后续版本增强。

1, 自动

alter session set container=pdb1;

create tablespace auto_index_tbs datafile '/u01/app/oracle/oradata/ANBOB19C/pdb1/auto_index_tbs01.dbf' size 5g;

EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_DEFAULT_TABLESPACE','AUTO_INDEX_TBS');
EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT');

create user anbob identified by anbob;
grant connect,resource to anbob;
alter user anbob quota 5g on users;
alter user anbob quota 1g on  auto_index_tbs;

create table anbob.tobj as select * from all_objects;
insert into anbob.tobj  select * from anbob.tobj;
insert into anbob.tobj  select * from anbob.tobj;

select object_name from anbob.tobj where object_id=7;
select object_name from anbob.tobj where object_id=8;
select object_name from anbob.tobj where object_id=9;
select object_name from anbob.tobj where object_id=10;

# 验证是否有创建自动索引
select * from DBA_AUTO_INDEX_EXECUTIONS;
SQL> select * from DBA_AUTO_INDEX_EXECUTIONS order by 1 desc;

EXECUTION_NAME                           EXECUTION_START     EXECUTION_END       ERROR_MESSAGE                  STATUS
---------------------------------------- ------------------- ------------------- ------------------------------ -----------
SYS_AI_2019-02-24/02:43:32               2019-02-24 02:43:32 2019-02-24 02:43:40                                COMPLETED
SYS_AI_2019-02-24/02:28:30               2019-02-24 02:28:30 2019-02-24 02:28:36                                COMPLETED
SYS_AI_2019-02-24/02:13:28               2019-02-24 02:13:28 2019-02-24 02:13:43                                COMPLETED
SYS_AI_2019-02-24/01:58:26               2019-02-24 01:58:26 2019-02-24 01:58:33                                COMPLETED

select * from DBA_AUTO_INDEX_STATISTICS;

select * from DBA_AUTO_INDEX_IND_ACTIONS;
COMMAND
—————————————————————-
STATEMENT
————————————————————————
START_TIME END_TIME ERROR#
——————- ——————- ———-
SYS_AI_2019-02-24/02:13:28 1
SYS_AI_672r7cg8xggpp
ANBOB
TOBJ
ANBOB
CREATE INDEX
CREATE INDEX “ANBOB”.”SYS_AI_672r7cg8xggpp” ON “ANBOB”.”TOBJ”(“OBJECT_NAME”) TABLESPACE “AUTO_INDEX_TBS” UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED LOW ONLINE
2019-02-24 02:13:35 2019-02-24 02:13:36 0

SYS_AI_2019-02-24/02:13:28 2
SYS_AI_gg8213wrt5npc
ANBOB
TOBJ
ANBOB
CREATE INDEX
CREATE INDEX “ANBOB”.”SYS_AI_gg8213wrt5npc” ON “ANBOB”.”TOBJ”(“OBJECT_ID”) TABLESPACE “AUTO_INDEX_TBS” UNUSABLE INVISIBLE AUTO COMPRESS ADVANCED LOW ONLINE
2019-02-24 02:13:36 2019-02-24 02:13:36 0

# 验证新索引

SQL> @ind anbob.tobj
Display indexes where table or index name matches %anbob.tobj%...

TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME                    DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
ANBOB                TOBJ                           SYS_AI_672r7cg8xggpp              1 OBJECT_NAME
                                                    SYS_AI_gg8213wrt5npc              1 OBJECT_ID


INDEX_OWNER          TABLE_NAME                     INDEX_NAME                     IDXTYPE    UNIQ STATUS   PART TEMP  H     LFBLKS           NDK   NUM_ROWS       CLUF LAST_ANALYZED       DEGREE VISIBILIT
-------------------- ------------------------------ ------------------------------ ---------- ---- -------- ---- ---- -- ---------- ------------- ---------- ---------- ------------------- ------ ---------
ANBOB                TOBJ                           SYS_AI_672r7cg8xggpp           NORMAL     NO   UNUSABLE NO   N     3       1817         59708     284952     120792 2019-02-24 02:13:36 1      INVISIBLE
                     TOBJ                           SYS_AI_gg8213wrt5npc           NORMAL     NO   UNUSABLE NO   N     3        658         72248     284952     120792 2019-02-24 02:13:36 1      INVISIBLE

# 验证执行计划
select * from anbob.tobj where object_id=11;

@xi

# 每个auto index task会生成一个报告,可以以text,xml,html格式查看,如下
— text
select DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY(‘TEXT’,’ALL’,’ALL’) from dual;

SQL> select DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY('TEXT','ALL','ALL') from dual;

DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY('TEXT','ALL','ALL')

GENERAL INFORMATION
-------------------------------------------------------------------------------
 Activity start               : 24-FEB-2019 02:43:32
 Activity end                 : 24-FEB-2019 02:43:40
 Executions completed         : 1
 Executions interrupted       : 0
 Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
 Index candidates            : 0
 Indexes created             : 0
 Space used                  : 0 B
 Indexes dropped             : 0
 SQL statements verified     : 0
 SQL statements improved     : 0
 SQL plan baselines created  : 0
 Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
 Unused indexes    : 0
 Space used        : 0 B
 Unusable indexes  : 0
-------------------------------------------------------------------------------

ERRORS
---------------------------------------------------------------------------------------------
No errors found.
---------------------------------------------------------------------------------------------

-- html
 set serveroutput on
 declare
 report clob := null;
 begin
 report := DBMS_AUTO_INDEX.REPORT_ACTIVITY(
 activity_start => sysdate-1,
 activity_end => sysdate,
 type => 'HTML',
 section => 'ALL',
 level => 'ALL');
 dbms_output.put_line(report);
 end;
 /

Note:
上面的例子可以看出自动索引已经创建到了invisible unusble阶段,还没有验证完该AUTO INDEX是否可以提升SQL的性能而物化该索引并visible。


2,  hint USE_AUTO_INDEXES
select /*+ USE_AUTO_INDEXES */ object_id from anbob.tobj where object_name='OBJ$';

SQL> select /*+USE_AUTO_INDEXES */ object_id from anbob.tobj where object_name='OBJ$';

 OBJECT_ID
----------
        18
        18
        18
        18

4 rows selected.

SQL> @x2
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 1825173622
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     5 |   200 |  1512   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TOBJ |     5 |   200 |  1512   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME"='OBJ$')

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------

   1 -  SEL$1
         E -  USE_AUTO_INDEXES

Note:
显示目前的版本这个hint还不能使用,从hint report中可以看出hint不能使用的原因是sql语法错误。 后续版本再关注, 文档中有说禁用自动索引hint为NO_USE_AUTO_INDEXES,目前同样也是禁用。

 

Oracle19c新特性: hint report

$
0
0

在oracle 19c引入了新的format option “hint report”,  hint report 显示我们sql文本中使用的hint, report body中会显示hint对应查询块hint是否使用, display_xplan的TYPICAL默认只是显示无效的hint. sql hint是从oracle 7时引入, 用于干涉CBO/RBO优化化器指定执行计划的一种手段, 19c前对于sql中指定了hint,可能因为某些原因sql hint未被使用,但不知道原因, 从19c的hint report很直观的给出sql hint的使用情况和未使用的原因.,如语Syntax errors、Unresolved hints、Conflicting hints、Hints affected by transformations etc.. 如配置了OPTIMIZER_IGNORE_HINTS参数和19c 中的OPTIMIZER_IGNORE_PARALLEL_HINTS或index hint 的index已经rename或drop\invalid.

SQL> select /*+ FULL(tobj) INDEX(BLABLABLA) BLABLABLA(tobj) bb */ count(*) from tobj;

  COUNT(*)
----------
    285000

SQL> select * from dbms_xplan.display_cursor(format=>'-cost');

PLAN_TABLE_OUTPUT
----------------------------------------------
SQL_ID  7ht5n0h8c87h7, child number 0
-------------------------------------
select /*+ FULL(tobj) INDEX(BLABLABLA) BLABLABLA(tobj) bb */ count(*)
from tobj

Plan hash value: 1381534028
------------------------------------------------------
| Id  | Operation          | Name | Rows  | Time     |
------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |          |
|   1 |  SORT AGGREGATE    |      |     1 |          |
|   2 |   TABLE ACCESS FULL| TOBJ |   285K| 00:00:01 |
------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2 (N - Unresolved (1), E - Syntax error (1))
---------------------------------------------------------------------------

   1 -  SEL$1
         N -  INDEX(BLABLABLA)
         E -  BLABLABLA

上面的hint report显示SEL$1查询块中2个hint不能使用,一个是blablabla()这是语法错误,因为没有这个hint,另一个是index(alias name), 因为该表不是那个alias name, 所有是Unresolved. 别外Full是正确的索引, 而bb当成了注释直接忽略。

SQL> select * from dbms_xplan.display_cursor('7ht5n0h8c87h7',format=>'+HINT_REPORT');

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  7ht5n0h8c87h7, child number 0
-------------------------------------
select /*+ FULL(tobj) INDEX(BLABLABLA) BLABLABLA(tobj) bb */ count(*)
from tobj

Plan hash value: 1381534028

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |  1501 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TOBJ |   285K|  1501   (1)| 00:00:01 |
-------------------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (N - Unresolved (1), E - Syntax error (1))
---------------------------------------------------------------------------
   1 -  SEL$1
         N -  INDEX(BLABLABLA)
         E -  BLABLABLA

   2 -  SEL$1 / TOBJ@SEL$1
           -  FULL(tobj)

这里hint report显示总共3个hint,其中full() 是正确的hint, 在使用hint_report时才会显示。另外format还有hint_report_unused这是默认,还有hint_report_used发现和hint_report是一样的,也可能是当前版本的缺陷。

另外如plan_table,v$sql_plan的OTHER_XML列也是可以得到hint信息,只是格式不直观,同是也是undocument.

SQL> select extract(xmltype(other_xml),’//hint_usage’) from v$sql_plan where other_xml like ‘%hint_usage%’ and sql_id=’7ht5n0h8c87h7′;

EXTRACT(XMLTYPE(OTHER_XML),’//HINT_USAGE’)
—————————————————————————————————————-
<hint_usage><q><n><![CDATA[SEL$1]]></n><h o=”EM” st=”PE”><x><![CDATA[BLABLABLA]]></x></h><t><f><![CDATA[“TOBJ”@”SEL$1″]]></f><h o=”EM”><x><![CDATA[FULL(tobj)]]></x></h></t><t st=”UR”><h o=”EM”><x><![CDATA[INDEX(BLABLABLA)]]></x></h></t></q></hint_usage>

  • ‘<n>’ is the query block name (hint scope can statement ‘<s>’, query block ‘<n>’, or alias ‘<f>’)

  • ‘@st’ is PE for parsing syntax error (‘E’ in dbms_xplan note)

  • ‘@st’ is UR for unresolved (‘N’ in dbms_xplan note)

  • ‘@st’ is ‘NU’ or ‘EU’ for unused (‘U’ in dbms_xplan note)

  • ‘<x>’ is the hint text

  • we might get a reason for unused ones in ‘<r>’

10053 TRACE

Hint Report:
Query Block: SEL$1
Syntax Error: BLABLABLA
Table: ("TOBJ"@"SEL$1")
FULL(tobj)
Table:
Unresolved: INDEX(BLABLABLA)
End Hint Report
Dumping Hints
=============
atom_hint=(@=0x7f1639d83668 err=0 resol=1 used=1 token=83 org=1 lvl=3 txt=INDEX ("BLABLABLA") )
atom_hint=(@=0x7f1639d81498 err=0 resol=1 used=1 token=448 org=1 lvl=3 txt=FULL ("TOBJ") )

其它例子

SQL> select  /*+ first_rows(1) first_rows(2) */ count(*) from tobj;

  COUNT(*)
----------
    285000

SQL> @x2

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 1381534028

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  1501   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TOBJ |   285K|  1501   (1)| 00:00:01 |
-------------------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2 (U - Unused (2))
---------------------------------------------------------------------------

   0 -  STATEMENT
         U -  first_rows(1) / conflicting optimizer mode hints
         U -  first_rows(2) / conflicting optimizer mode hints

17 rows selected.

SQL> select  /*+ first_rows(1) first_rows(1) */ count(*) from tobj;

  COUNT(*)
----------
    285000

SQL> @x2

PLAN_TABLE_OUTPUT
------------------------------------------------------------------
Plan hash value: 1381534028

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  1501   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TOBJ |   285K|  1501   (1)| 00:00:01 |
-------------------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------

   0 -  STATEMENT
         U -  first_rows(1) / duplicate hint

16 rows selected.

SQL> select  /*+index(tobj idx1) ignore_optim_embedded_hints */ count(*) from tobj;

  COUNT(*)
----------
    285000

SQL> @x2

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 1381534028

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  1501   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TOBJ |   285K|  1501   (1)| 00:00:01 |
-------------------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------

   2 -  SEL$1 / TOBJ@SEL$1
         U -  index(tobj idx1) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

16 rows selected.

SQL> alter session set optimizer_ignore_parallel_hints=true;

Session altered.

SQL> select  /*+parallel(tobj 8) */ count(*) from tobj;

  COUNT(*)
----------
    285000

SQL> @x2

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------
Plan hash value: 1381534028

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |  1501   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| TOBJ |   285K|  1501   (1)| 00:00:01 |
-------------------------------------------------------------------

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------

   2 -  SEL$1 / TOBJ@SEL$1
         U -  parallel(tobj 8) / because of _optimizer_ignore_parallel_hints

— over —

Alert : 当在AIX 7.1/7.2使用AIX Flash Cache 读写/dev/pfcdd0时System crashes

$
0
0

当前AXI 7.1 和7.2 应该是IBM服务期内的主流操作系统版本, AIX Flash cache应该是AIX 7.2时引入,后期又在AIX v7.1 tl4 SP2中支持。Flash Cache也称为数据的服务器端缓存。它允许LPAR使用ssd或Flash storage作为只读cache,以提高旋转磁盘的读取性能。

Flash cache需要两个组件:

1,Cache Management  /usr/sbin/cache_mgt 是一个命令行工具,用于创建、分配、销毁Flash CACHE;
2,  缓存引擎——这个算法决定缓存什么,并从缓存中检索数据;

安装后可以使用以下命令检查

lslpp -l | grep Cache
  bos.pfcdd.rte          7.2.1.0  COMMITTED  Power Flash Cache
  cache.mgt.rte         7.2.1.0  COMMITTED  AIX SSD Cache Device
  bos.pfcdd.rte          7.2.1.0  COMMITTED  Power Flash Cache
  cache.mgt.rte         7.2.1.0  COMMITTED  AIX SSD Cache Device

lslpp -l | grep lash
  bos.pfcdd.rte          7.2.1.0  COMMITTED  Power Flash Cache
	7.2.1.0  COMMITTED  Common CAPI Flash Adapter
	7.2.0.0  COMMITTED  CAPI Flash Adapter Diagnostics
	7.2.0.0  COMMITTED  CAPI Flash Adapter Device
  devices.common.IBM.cflash.rte
	7.2.1.0  COMMITTED  Common CAPI Flash Device
  bos.pfcdd.rte      	7.2.1.0  COMMITTED  Power Flash Cache
	7.2.1.0  COMMITTED  Common CAPI Flash Adapter
  devices.common.IBM.cflash.rte
	7.2.0.0  COMMITTED  Common CAPI Flash Device

“/dev/pfcdd0″是 “AIX Flash Cache”使用的psudo device. 在IBM官网检查该关键字可以检索出以下问题

在AIX 7.1 和AXI 7.2 一些版本中任何程序在读取和写入”/dev/pfcdd0″设备时可能会致系统crash.  需要SA 及时的安装相应的补丁, 与ORACLE相关出现的问题有:

Server Reboots During Silent Mode Grid Installation For RAC 11.2.0.4 (文档 ID 2319755.1)

Bug 26270499 : RDA COLLECTION CAUSING NODE REBOOT IN RAC

一种是是静默安装GI时读取了该文件,Oracle解决方法是建议从OS layer删除”/dev/pfcdd0″

另一种是在运行oracle RDA对数据库做巡检时检查ASM模块时,同样也在读取”pfcdd0″时致node crash.  主要原因是因为AIX  BUG,解决办法是当然首选安装OS PATCH, 同时Oracle在RDA新版本(8.17.17.9.12 or newer)中, 调整了RDA的代码,间接规避了该问题。

重现问题

# OS layer

$ dd if=/dev/pfcdd0 of=/tmp/pfcdd0.dd  bs=1024k count=10

# Oracle GI layer

$ kfed read /dev/pfcdd0

小结:

这次预警主要是因为AIX的新特性Flash cache device相关的bug引起的ORACLE 数据库可用性风险, 虽然坑是AIX挖的,但是对于装数据库和巡检(RDA),DBA及客户就是直接受害者。 OracleDBA在使用RDA巡检运行在AIX 7.1 、7.2上使用了ASM 的数据库时可能会把库查死,希望你看到时不是已经掉过坑。同时应该注意在配置asm_diskstring时不要配置/dev/*, 这点在我之前的《ASM Disk Discovery 最佳实践》中也有整理。

How to release still “killed“ status session in v$session? (释放killed的session) (四)

$
0
0

 How to release still “killed“ status session in v$session? (释放killed的session) (一)
How to release still “killed“ status session in v$session? (释放killed的session) (二)
How to release still “killed“ status session in v$session? (释放killed的session) (三)

继续一个killed无法释放的案例, 环境是11.2.0.4 2nodes-RAC on SELS11, 开始是数据库突然出现了非常高的负载wait event是“enq: SV – contentio”,“row cache lock”,“enq: SQ – contention ”,  是与sequnece相关,SQ较为常见不再描述, 当多实例争用sequence时,如果sequence是ORDER and NOCACHE主要的wait event 是row cache lock, 当sequence是cached通常出现的是SV。 因为这个事件KILL了一批进程,后期发现有个进程始终是KILLED状态,并且同时影响后期的AWR都无法生成。

SQL>@ase

USERNAME          SID EVENT                MACHINE    MODULE               STATUS   LAST_CALL_ET SQL_ID          WAI_SECINW ROW_WAIT_OBJ# SQLTEXT                        BS          CH# OSUSER     HEX
---------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ---------
PATROLDB          337 library cache: mutex kdechan1   SQL*Plus             ACTIVE          18858 2kxmfvbpdkbj4   0:469             663378 select 'DATAPOINT ' bpb, sql_t 1:4851        0 itmuser      1000000
NGESHOP          4851 On CPU / runqueue    qdtza1     oracle               KILLED          95113 axyfr1t8k6xy7   -1:95111          128735                                :               grid

SQL> @usid 4851

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID             OPID CPID                     SQL_ID           HASH_VALUE   LASTCALL STATUS
   SADDR            PADDR            TADDR            LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ --------------- ----------- ---------- ------
-- ---------------- ---------------- ---------------- -----------------
NGESHOP                  '4851,22847'    497193017 grid             qdtza1             (TNS V1-V3)          33793             192 21244                    axyfr1t8k6xy7    1361278919      20965 KILLED
   0000000B31C705F0 0000000B5124B188 0000000B2AB94178 20190123 07:00:54

Note:
当前SESSION id 4851是KILLED状态,并且堵塞了其它会话。当然根据会话信息我们确认了客户端是一个dblink调用。

SQL> @st dba_2pc_pending

no rows selected

SQL> SELECT /*+ ORDERED */
  2        SUBSTR (s.ksusemnm, 1, 10) || '-' || SUBSTR (s.ksusepid, 1, 10)
  3            "ORIGIN",
  4         SUBSTR (g.K2GTITID_ORA, 1, 35) "GTXID",
  5         SUBSTR (s.indx, 1, 4) || '.' || SUBSTR (s.ksuseser, 1, 5) "LSESSION",
  4         SUBSTR (g.K2GTITID_ORA, 1, 35) "GTXID",
  5         SUBSTR (s.indx, 1, 4) || '.' || SUBSTR (s.ksuseser, 1, 5) "LSESSION",
  6         s2.username,
  7         SUBSTR (
  8            DECODE (
  9               BITAND (ksuseidl, 11),
 10               1, 'ACTIVE',
 11               0, DECODE (BITAND (ksuseflg, 4096), 0, 'INACTIVE', 'CACHED'),
 12               2, 'SNIPED',
 13               3, 'SNIPED',
 14               'KILLED'),
 15            1,
 16            10)
 17            "Status",
 18         SUBSTR (s2.event, 1, 10) "WAITING"
 19    FROM x$k2gte g,
 20         x$ktcxb t,
 21         x$ksuse s,
 22         v$session s2
 23   WHERE
 24  g.K2GTDXCB = t.ktcxbxba
 25         AND g.K2GTDSES = t.ktcxbses
 26         AND s.addr = g.K2GTDSES
 27         AND s2.sid = s.indx;

ORIGIN                                    GTXID                                                                  LSESSION            USERNAME   Status           WAITING
----------------------------------------- ---------------------------------------------------------------------- ------------------- ---------- ---------------- --------------------
qdtza1-21244                              UCISA.HEBEI.MOBILE.COM.53c50444.361                                    4851.22847          NGESHOP    KILLED           SQL*Net me

# Pmon trace

*** 2019-01-23 18:10:42.356
KGX Atomic Operation Log 0xb97f5f4b8
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5)
 Library Cache uid 303 efd 9 whr 76 slp 12366
 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

*** 2019-01-23 18:10:52.371
found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead
KGX cleanup...
KGX Atomic Operation Log 0x2a4ed9008
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6)
 Library Cache uid 4851 efd 13 whr 98 slp 0
 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

LibraryHandle:  Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' 
    FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 

*** 2019-01-23 18:13:22.492
KGX cleanup...
KGX Atomic Operation Log 0xb97f5f4b8
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5)
 Library Cache uid 303 efd 9 whr 76 slp 12407
 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

*** 2019-01-23 18:13:32.504
found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead
KGX cleanup...
KGX Atomic Operation Log 0x2a4ed9008
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6)
 Library Cache uid 4851 efd 13 whr 98 slp 0
 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

LibraryHandle:  Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' 
    FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 

*** 2019-01-23 18:16:02.605
KGX cleanup...
KGX Atomic Operation Log 0xb97f5f4b8
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5)
 Library Cache uid 303 efd 9 whr 76 slp 12330
 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

*** 2019-01-23 18:16:12.620
found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead
KGX cleanup...
KGX Atomic Operation Log 0x2a4ed9008
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6)
 Library Cache uid 4851 efd 13 whr 98 slp 0
 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

LibraryHandle:  Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' 
    FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 

# hanganalyze trace

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
 Oracle session identified by:
 {
 instance: 1 (echandb.echandb1)
 os id: 29849
 process id: 2, oracle@kdechan1 (PMON)
 session id: 303
 session serial #: 1
 }
 is waiting for 'library cache: mutex X' with wait info:
 {
 p1: 'idn'=0x565cf76d
 p2: 'value'=0x12f300000000
 p3: 'where'=0x4c
 time in wait: 21.160256 sec
 timeout after: never
 wait id: 19282212
 blocking: 0 sessions
 wait history:
 * time between current wait and wait #1: 0.021533 sec
 1. event: 'pmon timer'
 time waited: 10.009346 sec
 wait id: 19282211 p1: 'duration'=0x3e8
 * time between wait #1 and #2: 0.012146 sec
 2. event: 'library cache: mutex X'
 time waited: 2 min 29 sec
 wait id: 19282210 p1: 'idn'=0x565cf76d
 p2: 'value'=0x12f300000000
 p3: 'where'=0x4c
 * time between wait #2 and #3: 0.021147 sec
 3. event: 'pmon timer'
 time waited: 10.010484 sec
 wait id: 19282209 p1: 'duration'=0x3e8
 }
 and is blocked by
 => Oracle session identified by:
 {
 instance: 1 (echandb.echandb1)
 os id: 33793 (DEAD)
 process id: 192, oracle@kdechan1
 session id: 4851
 session serial #: 22847
 }
 which is not in a wait:
 {
 last wait: 202 min 18 sec ago
 blocking: 4 sessions
 wait history:
 1. event: 'SQL*Net message from client'
 time waited: 0.001799 sec
 wait id: 13552700 p1: 'driver id'=0x54435000
 p2: '#bytes'=0x1
 * time between wait #1 and #2: 0.000010 sec
 2. event: 'SQL*Net message to client'
 time waited: 0.000002 sec
 wait id: 13552699 p1: 'driver id'=0x54435000
 p2: '#bytes'=0x1
 * time between wait #2 and #3: 0.000240 sec
 3. event: 'SQL*Net message from client'
 time waited: 0.002142 sec
 wait id: 13552698 p1: 'driver id'=0x54435000
 p2: '#bytes'=0x1
 }
 
Chain 1 Signature: <not in a wait><='library cache: mutex X'
Chain 1 Signature Hash: 0x6d47f496

# systemstate dump trace

PROCESS 192:
—————————————-
SO: 0xb5124b188, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xb5124b188, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:192, ser:42, calls cur/top: 0xadbd83798/0xadbd83798
flags : (0x1) DEAD
flags2: (0x8000), flags3: (0x10)
intr error: -2147483620, call error: 0, sess error: 0, txn error 0
intr queue: 2147483676 2147483676 2147483676
ksudlp FALSE at location: 0
Cleanup details:
Found dead = 244 min 37 sec ago
Total Cleanup attempts = 99, Cleanup time = 235 min 34 sec, Cleanup timer = 235 min 30 sec
Last attempt (full) started 11 sec ago, Length = in progress
(post info) last post received: 0 0 27
last post received-location: ksa2.h LINE:289 ID:ksasnr
last process to post me: 0xb5125cdc0 65 6
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:285 ID:ksasnd
last process posted by me: 0xb5125cdc0 65 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0xb993e85d0
skgvtime: process 33793 unix pid wrap detected 1496387522 1499482369
O/S info: user: grid, term: UNKNOWN, ospid: 33793 (DEAD)
OSD pid info: Unix process pid: 33793, image: oracle@kdechan1
Short stack dump: ORA-00072: process “Unix process pid: 33793, image: oracle@kdechan1” is not active

PROCESS 2: PMON
—————————————-
SO: 0xb591b8490, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xb591b8490, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:2, ser:1, calls cur/top: 0xb59e5a9d8/0xb59e5a9d8
flags : (0xe) SYSTEM
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 16
last post received-location: ksu.h LINE:14056 ID:ksupsc
last process to post me: 0xb591efaa8 209 0
last post sent: 0 0 38
last post sent-location: ksr2.h LINE:641 ID:ksrchdelete
last process posted by me: 0xb39189cc0 2 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0xb993e85d0
O/S info: user: oracle, term: UNKNOWN, ospid: 29849
OSD pid info: Unix process pid: 29849, image: oracle@kdechan1 (PMON)
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178
<-ksliwat()+2022<-kslwaitctx()+163<-ksfwaitctx()+14<-kgxWait()+1330<-kgxExclusive()+447<-kglGetMutex()+212<-kglobfr()+1010<-kgllccl()+1657
<-kglMutexRecovery()+661<-kgxCleanup()+548<-kglMutexCleanupAll()+437<-kglMutexCleanup()+13<-kksCleanSessionState()
+232<-ksudlp()+202<-kssxdl()+489<-kssdel()+644<-ksuxdl()+398<-ksuxda_del_proc()+166<-ksucln_dpc_cleanup()+259<-ksucln_dpc_dfs()
+248<-ksucln_dpc_main()+1188<-ksucln_dpc()+29<-ksucln()+1250<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103
<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230

当前看到是在清理一个分布式事务的dead 会话时失败,根据Pmon的call stack在MOS中检索出BUG。

AWR or ADDM Report Generation Hangs Waiting on ‘library cache:mutex X’ Event ( Doc ID 1960432.1 )

Bug 13542050 : USE OF KGL MUTEXES MIGHT BLOCK ON BOGUS MUTEX HOLDER.

Bug 13542050 – A mutex related hang with holder around 65534 (0xfffe) – superseded ( Doc ID 13542050.8 )
Note that this fix has been superseded by the fix in Bug:24739928

Symptoms:
Hang (Involving Shared Resource)
Mutex Contention
Waits for “cursor: pin X”
Waits for “library cache: mutex X”
Stack is likely to include kgxWait

最终也是重启instance 释放些killed session.

Oracle12c R2注意事项: 大量crsctl.bin进程cpu使用率高,等待crs call completion

$
0
0

前不久遇到的一个问题,一套12.2的RAC环境, CPU使用率高,使用top可以看到有大量crsctl.bin进程导致, sys cpu占用了大部分,  如果从数据库内查看等待会伴随着wait event “crs call completion”, 有时还会级联堵塞出现wait event “library cache lock”。crs call completion是当db instance layer 通知 CRS daemon process

在11G时有时会因为 bug 10019726, bug 12615394bug 12767563 出现这样的情况, 但是在11.2.0.3 已修复, 但好像有12.1也遇到这种情况的案例。

对于12C r2 给这个问题可以尝试配置hidden parameter “_notify_crs”=false, 调过crs通知等待,理论该调整不会给数据库性能问题带来影响。但是调整这个参数还是有一定的影响需要了解,下面整理一下”_notify_crs” 参数相关注意事项。

_notify_crs解决的问题

除了上面描述的问题,还有一些情况的解决方案也是调整该参数

1,  The issue is caused by internal, unpublished Bug 13483672 “ORA-7445 [strlen()+16] creating database dependencies for large number of disk groups” which causes a buffer overflow if the diskgroup dependencies of the database resource exceed a certain size.

The Bug 13483672 has been fixed in 11.2.0.3 PSU 3 and windows 11.2.0.3.7 patch bundle. Interim patch has also been provided for 11.2.0.2 on certain platforms.

solution:

set “_notify_crs”=FALSE in pfile or spfile, then restart the database

2, Grid Infrastructure home and rdbms home is running on 11.2.0.3.7. Database instance fail to start, with ORA-7445 [strstr()]

The failing process is gen0, and the current Wait Stack shows waiting for ‘CRS call completion’, The bug fix is included in 12.2.0.2, request/apply the patch if there’s an impact.

The workaround for successful startup of instance is by setting hidden parameter “_notify_crs”=false, which will prevent the database instance from notifying the CRS daemon process.
BUG 22999793 – ORA-07445 [__STRSTR_SSE42()+10] – SIMILAR TO BUG 17230892 THAT CT IS HITTING

3,  The instance fails to start with the following error on GEN0 process,ORA-07445: exception encountered: core dump [PC:0xFE04] [SIGSEGV] [ADDR:0x0] [PC:0xFE04] [Invalid permissions for mapped object] []

The instance is terminated then fails to start with GEN0 process returns PC:0xFE04 [ADDR:0x0] on DB Instance. (文档 ID 2184308.1)

 

调整_notify_crs参数带来的影响

1, Setting _notify_crs to false prevents the ASM instance from notifying the CRS daemon process when diskgroups are being mounted.The diskgroup creates successfully in ASM however it doesn’t reflect in the OCR (crsctl stat res -t).

2, When hidden parameter “_notify_crs” is set to FALSE, database does not use password file whose location is specified within database resource registered to CRS. As a result, connecting as sysdba fails with ORA-1017

When “_notify_crs” is set to FALSE, database does not use password file specified by database resource, but use default location, Create or copy password file at default location, $ORACLE_HOME/dbs/orapw<ORACLE_SID>

小结

当解决一下问题时,如调整隐藏参数,同时需要确认一下调整该参数带来的影响,如开始提到的12.2 中的问题,调整_nodify_crs=false可以临时解决该问题,还有上面提到的几个BUG也是修改参数可以解决,但是修改此参数会导致填加新ASM磁盘组后不会自动注册到CRS资源中。同时crs中注册的实例密码文件将不会再使用,而是使用默认的路径,可能影响sysdba登录。


Oracle12c R2注意事项: 多个”/usr/bin/ssh -o StrictHostKeyChecking… /sbin/ifconfig -a”进程导到CPU使用高

$
0
0

12c R2 RAC环境下又一个问题特性,同样会导致cpu使用率高。

$ ps -ef|grep ifconfig
root 19141 1 0 06:25 ? 00:00:00 sh -c /bin/su -l grid -c "/usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ANBOB2 /sbin/ifconfig -a" 2>&1
root 13442 18941 99 06:25 ? 06:07:08 /bin/su -l grid -c /usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ANBOB2 /sbin/ifconfig -a
grid 26911 23166 0 12:32 pts/1 00:00:00 grep ifconfig
root 23231 1 0 Jan23 ? 00:00:00 sh -c /bin/su -l grid -c "/usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ANBOB2 /sbin/ifconfig -a" 2>&1
root 62143 23231 99 Jan23 ? 14:29:31 /bin/su -l grid -c /usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ANBOB2 /sbin/ifconfig -a
root 77112 1 0 10:30 ? 00:00:00 sh -c /bin/su -l grid -c "/usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ANBOB2 /sbin/ifconfig -a" 2>&1
root 75443 77170 99 10:30 ? 02:02:37 /bin/su -l grid -c /usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=5 ANBOB2 /sbin/ifconfig -a

$top
 PID   USER PR NI VIRT RES  SHR  S %CPU %MEM TIME+ COMMAND
 62254 root 25 0 98.8m 1392 1104 R 100.0 0.0 851:33.36 su
 57942 root 25 0 98.8m 1400 1104 R 99.9  0.0 349:10.86 su
 52171 root 25 0 98.8m 1404 1104 R 99.9  0.0 104:39.33 su

根据MOS note# 2340905.1记录是Bug 24692439 : LNX64-12.2-DIAGSNAP: AUXILIARY CMDS GENERATED BY DIAGSNAP WOULD HOG CPU FOREVER。
解决方法是禁用diagsnap,然后手动kill掉这些su 进程。

什么是diagsnap?

为了当分析节点重启和节点驱逐故障时,避免因缺少网络和操作系统级信息无法定位,引入diagsnap并与GI集成,diagsnap是12.1.0.2 GI引入的新进程,CHM的osysmod管理diagsnap资源,该资源收集弥补CHM通常不收集的其他OS统计信息。diagsnap采集是每15分钟自动运行一次, 有些特列情况也会触发diagsnap, 如下:

1. cssd发现丢失网络心跳时
2. gipcd发现 interfaces启停变化时
3. gipcd rank events

diagsnap会调用执行下面的操作系统命令

 

iostat
netstat
lsof <gipcd pid/ocssd pid/crsd pid/ohasd pid>
arp
ifconfig
ping over the private interconnect
tcpdump
top

禁用diagsnap

以GI owner身份执行.(grid)

$GI_HOME/bin/oclumon manage -disable diagsnap

Diagsnap option is successfully Disabled on ANBOB1
Diagsnap option is successfully Disabled on ANBOB2
Successfully Disabled diagsnap

如果12.1 版本上执行不成功,需要以root身份执行diagsnap.pl deregister” ,手动编辑每个节点的$GI_HOME/crf/admin/crf<hostname>.ora文件,确认PSTACK=DISABLE 和DIAGSNAP=DISABLE

Oracle12c R2注意事项: SCM0进程的CPU使用率高

$
0
0

本篇是12c版本中cpu high的第三种情况: scm0进程占用较高的cpu使用率。前两篇如下

Oracle12c R2注意事项: 大量crsctl.bin进程cpu使用率高,等待crs call completion

Oracle12c R2注意事项: 多个”/usr/bin/ssh -o StrictHostKeyChecking… /sbin/ifconfig -a”进程导到CPU使用高

(Distributed Lock Management )DLM Statistics Collection and Management从属(SCM0)后台进程负责收集和管理全局入队服务(GES)和全局缓存服务(GCS)的统计信息。如果在数据库中启用了DLM统计信息收集此进程(scm0)才会存在, 但是官方描述在12.2版本,默认即使收集了DLM statistics在12.2版本中的use these stats service based affinity and cache warmup功能也是禁用的,只是为了后续版本准备,所以在禁用该进程不会影响12C r2版本。

原因是bug 24590018 – RAC PERF: SCM0 PROCESS USING 100% CPU, FG’S USING ~80% SYS CPU POSTING SCM0

要解决此问题,必须停用DLM统计信息收集,或者手动kill   scm0进程并此进程会自动启动。

禁用DLM统计信息收集方法:

SQL> alter system set "_dlm_stats_collect" = 0 scope = spfile sid = '*';

注意_dlm_stats_collect参数更改需要重新启动数据库。

另一种是确定相应scm0进程的进程id,并用kill -9关闭它。关闭后,此进程将自动重启。

[root@anbob01 ~]# ps -ef|grep scm
oracle 11762 1 0 May22 ? 00:02:28 ora_scm0_ANBOB1

[root@anbob01 ~]# kill -9 11762

 

建议在安装12C R2时就配置如下参数:

alter system set “_dlm_stats_collect” = 0 scope = spfile sid = ‘*’;

query dba_free_space(tablespace usage) slow after upgrade 12c R2

$
0
0

前不久有个下线EXADATA并同时从11g R2 升级12C R2的案例,反应升级12c后明显感觉原来查询表空间使用率的脚本时间比升级前长了很多, 要花好几分钟, 这种情况时通常是因为recyclebin$回收站中的对象太多,清理回收站解决, 但是这次的回收站并无多少对象(<100), 这是一个50 TB左右的数据库,有350个左右的数据文件。

今天有时间分析一下,下一步当然是要看SQL的执行计划,这里使用sql monitor

 SQL> select dbms_sqltune.report_sql_monitor(sql_id=>'&sql_id',report_level=>'ALL',type=>'text') from dual;  
Enter value for sql_id: 19bgcf8grxdxm

DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'19BGCF8GRXDXM',REPORT_LEVEL=>'ALL',TYPE=>'TEXT')
--------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
select t.tablespace_name, t.mb "TotalMB", t.mb - nvl(f.mb,0) "UsedMB", nvl(f.mb,0) "FreeMB" ,lpad(ceil((1-nvl(f.mb,0)/decode(t.mb,0,1,t.mb))*100)||'%', 6) "% Used", t.ext "Ext", '|'||rpad(lpad('#',ceil((1-nvl(f.mb,0)/decode(t.mb,0,1,t.mb))*20),'#'),20,' ')||'|' "Used" from ( select tablespace_name, trunc(sum(bytes)/1048576) MB from dba_free_space group by tablespace_name union all select tablespace_name, trunc(sum(bytes_free)/1048576) MB from v$temp_space_header group by tablespace_name ) f, (
select tablespace_name, trunc(sum(bytes)/1048576) MB, max(autoextensible) ext from dba_data_files group by tablespace_name union all select tablespace_name, trunc(sum(bytes)/1048576) MB, max(autoextensible) ext from dba_temp_files group by tablespace_name ) t where t.tablespace_name = f.tablespace_name (+) order by t.tablespace_name

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  SYS (614:29445)
 SQL ID              :  19bgcf8grxdxm
 SQL Execution ID    :  16777220
 Execution Started   :  02/25/2019 16:20:58
 First Refresh Time  :  02/25/2019 16:21:02
 Last Refresh Time   :  02/25/2019 16:22:24
 Duration            :  87s
 Module/Action       :  sqlplus@kdrpt01 (TNS V1-V3)/-
 Service             :  SYS$USERS
 Program             :  sqlplus@kdrpt01 (TNS V1-V3)

Global Stats
===================================================================
| Elapsed |   Cpu   |    IO    | Cluster  | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
===================================================================
|      96 |      28 |       60 |     9.15 |     1M | 168K |   1GB |
===================================================================

Parallel Execution Details (DOP=2 , Servers Allocated=4)
========================================================================================================================================
|      Name      | Type | Server# | Elapsed |   Cpu   |    IO    | Cluster  | Buffer | Read | Read  |           Wait Events            |
|                |      |         | Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |            (sample #)            |
========================================================================================================================================
| PX Coordinator | QC   |         |      96 |      28 |       60 |     9.15 |     1M | 168K |   1GB | gc cr disk read (12)             |
|                |      |         |         |         |          |          |        |      |       | control file sequential read (2) |
|                |      |         |         |         |          |          |        |      |       | db file sequential read (54)     |
========================================================================================================================================

Instance Drill-Down
=================================================================================================================================
| Instance | Process Names | Elapsed |   Cpu   |    IO    | Cluster  | Buffer | Read | Read  | Wait Events                      |
|          |               | Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |                                  |
=================================================================================================================================
|    1     | QC            |      96 |      28 |       60 |     9.15 |     1M | 168K |   1GB | gc cr disk read (12)             |
|          |               |         |         |          |          |        |      |       | control file sequential read (3) |
|          |               |         |         |          |          |        |      |       | db file sequential read (54)     |
=================================================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=259291012)
============================================================================================================================================================================================================
| Id    |                Operation                 |           Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Mem | Activity |         Activity Detail          |
|       |                                          |                           | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |     |   (%)    |           (# samples)            |
============================================================================================================================================================================================================
|     0 | SELECT STATEMENT                         |                           |         |      |           |        |     1 |          |      |       |   . |          |                                  |
|     1 |   SORT ORDER BY                          |                           |       8 |  122 |           |        |     1 |          |      |       |   . |          |                                  |
|     2 |    HASH JOIN OUTER                       |                           |       8 |  121 |         1 |     +4 |     1 |        0 |      |       | 1MB |          |                                  |
|     3 |     VIEW                                 |                           |       8 |   22 |         1 |     +4 |     1 |       26 |      |       |   . |          |                                  |
|     4 |      UNION-ALL                           |                           |         |      |         1 |     +4 |     1 |       26 |      |       |   . |          |                                  |
|     5 |       HASH GROUP BY                      |                           |       6 |   17 |         1 |     +4 |     1 |       25 |      |       |   . |          |                                  |
|     6 |        VIEW                              | DBA_DATA_FILES            |       6 |   16 |         1 |     +4 |     1 |     1342 |      |       |   . |          |                                  |
|     7 |         UNION-ALL                        |                           |         |      |         1 |     +4 |     1 |     1342 |      |       |   . |          |                                  |
|     8 |          NESTED LOOPS                    |                           |       1 |    6 |           |        |     1 |          |      |       |   . |          |                                  |
|     9 |           NESTED LOOPS                   |                           |       1 |    5 |           |        |     1 |          |      |       |   . |          |                                  |
|    10 |            NESTED LOOPS                  |                           |       1 |    5 |         1 |     +4 |     1 |        0 |      |       |   . |          |                                  |
|    11 |             FIXED TABLE FULL             | X$KCCFN                   |       5 |      |         1 |     +4 |     1 |     1342 |   23 |   4MB |   . |          |                                  |
|    12 |             TABLE ACCESS BY INDEX ROWID  | FILE$                     |       1 |    1 |         1 |     +4 |  1342 |        0 |      |       |   . |          |                                  |
|    13 |              INDEX UNIQUE SCAN           | I_FILE1                   |       1 |      |         1 |     +4 |  1342 |     1342 |      |       |   . |          |                                  |
|    14 |            FIXED TABLE FIXED INDEX       | X$KCCFE (ind:1)           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    15 |           TABLE ACCESS CLUSTER           | TS$                       |       1 |    1 |           |        |       |          |      |       |   . |          |                                  |
|    16 |            INDEX UNIQUE SCAN             | I_TS#                     |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    17 |          NESTED LOOPS                    |                           |       5 |   10 |         1 |     +4 |     1 |     1342 |      |       |   . |          |                                  |
|    18 |           NESTED LOOPS                   |                           |       5 |    5 |         1 |     +4 |     1 |     1342 |      |       |   . |          |                                  |
|    19 |            NESTED LOOPS                  |                           |       5 |    5 |         1 |     +4 |     1 |     1342 |      |       |   . |          |                                  |
|    20 |             NESTED LOOPS                 |                           |       5 |      |         1 |     +4 |     1 |     1342 |      |       |   . |          |                                  |
|    21 |              FIXED TABLE FULL            | X$KCCFN                   |       5 |      |         1 |     +4 |     1 |     1342 |   23 |   4MB |   . |          |                                  |
|    22 |              FIXED TABLE FIXED INDEX     | X$KTFBHC (ind:1)          |       1 |      |         1 |     +4 |  1342 |     1342 |      |       |   . |          |                                  |
|    23 |             TABLE ACCESS BY INDEX ROWID  | FILE$                     |       1 |    1 |         1 |     +4 |  1342 |     1342 |      |       |   . |          |                                  |
|    24 |              INDEX UNIQUE SCAN           | I_FILE1                   |       1 |      |         1 |     +4 |  1342 |     1342 |      |       |   . |          |                                  |
|    25 |            FIXED TABLE FIXED INDEX       | X$KCCFE (ind:1)           |       1 |      |         4 |     +1 |  1342 |     1342 | 5407 |  84MB |   . |     2.25 | control file sequential read (2) |
|    26 |           TABLE ACCESS CLUSTER           | TS$                       |       1 |    1 |         1 |     +4 |  1342 |     1342 |      |       |   . |          |                                  |
|    27 |            INDEX UNIQUE SCAN             | I_TS#                     |       1 |      |         1 |     +4 |  1342 |     1342 |      |       |   . |          |                                  |
|    28 |       HASH GROUP BY                      |                           |       2 |    5 |         1 |     +4 |     1 |        1 |      |       |   . |          |                                  |
|    29 |        VIEW                              | DBA_TEMP_FILES            |       2 |    4 |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    30 |         SORT UNIQUE                      |                           |       2 |    4 |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    31 |          UNION-ALL                       |                           |         |      |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    32 |           NESTED LOOPS                   |                           |       1 |    1 |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    33 |            NESTED LOOPS                  |                           |       1 |      |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    34 |             NESTED LOOPS                 |                           |       1 |      |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    35 |              NESTED LOOPS                |                           |       1 |      |         1 |     +4 |     1 |        8 |      |       |   . |          |                                  |
|    36 |               FIXED TABLE FULL           | X$KCCTF                   |       1 |      |         1 |     +4 |     1 |        8 |    4 | 65536 |   . |          |                                  |
|    37 |               FIXED TABLE FIXED INDEX    | X$KCCFN (ind:1)           |       1 |      |         1 |     +4 |     8 |        8 |   32 | 512KB |   . |          |                                  |
|    38 |              FIXED TABLE FIXED INDEX     | X$KCVFHTMP (ind:1)        |       1 |      |         1 |     +4 |     8 |        8 |   40 | 576KB |   . |          |                                  |
|    39 |             FIXED TABLE FIXED INDEX      | X$KTFTHC (ind:2)          |       1 |      |         1 |     +4 |     8 |        8 |      |       |   . |          |                                  |
|    40 |            TABLE ACCESS CLUSTER          | TS$                       |       1 |    1 |         1 |     +4 |     8 |        8 |      |       |   . |          |                                  |
|    41 |             INDEX UNIQUE SCAN            | I_TS#                     |       1 |      |         1 |     +4 |     8 |        8 |      |       |   . |          |                                  |
|    42 |           NESTED LOOPS                   |                           |       1 |    1 |         1 |     +4 |     1 |        0 |      |       |   . |          |                                  |
|    43 |            HASH JOIN                     |                           |       1 |      |         1 |     +4 |     1 |       16 |      |       |   . |          |                                  |
|    44 |             PX COORDINATOR               |                           |         |      |         1 |     +4 |     1 |       16 |      |       |   . |          |                                  |
|    45 |              PX SEND QC (RANDOM)         | :TQ10000                  |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    46 |               VIEW                       | GV$TEMPFILE_INFO_INSTANCE |         |      |           |        |       |          |      |       |   . |          |                                  |
|    47 |                NESTED LOOPS              |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    48 |                 NESTED LOOPS             |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    49 |                  MERGE JOIN CARTESIAN    |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    50 |                   FIXED TABLE FULL       | X$KCCTF                   |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    51 |                   BUFFER SORT            |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    52 |                    FIXED TABLE FULL      | X$KCVFHTMP                |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    53 |                  FIXED TABLE FIXED INDEX | X$KTFTHC (ind:2)          |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    54 |                 FIXED TABLE FIXED INDEX  | X$KCCFN (ind:1)           |       1 |      |         1 |     +3 |       |          |      |       |   . |     2.25 | control file sequential read (2) |
|    55 |             PX COORDINATOR               |                           |         |      |         1 |     +4 |     1 |        2 |      |       |   . |          |                                  |
|    56 |              PX SEND QC (RANDOM)         | :TQ20000                  |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    57 |               VIEW                       | GV$INSTANCE               |         |      |           |        |       |          |      |       |   . |          |                                  |
|    58 |                MERGE JOIN CARTESIAN      |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    59 |                 MERGE JOIN CARTESIAN     |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    60 |                  MERGE JOIN CARTESIAN    |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    61 |                   FIXED TABLE FULL       | X$KSUXSINST               |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    62 |                   BUFFER SORT            |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    63 |                    FIXED TABLE FULL      | X$QUIESCE                 |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    64 |                  BUFFER SORT             |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    65 |                   FIXED TABLE FULL       | X$KJIDT                   |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    66 |                 BUFFER SORT              |                           |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    67 |                  FIXED TABLE FULL        | X$KVIT                    |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    68 |            TABLE ACCESS CLUSTER          | TS$                       |       1 |    1 |         1 |     +4 |    16 |        0 |      |       |   . |          |                                  |
|    69 |             INDEX UNIQUE SCAN            | I_TS#                     |       1 |      |         1 |     +4 |    16 |       16 |      |       |   . |          |                                  |
|    70 |     VIEW                                 |                           |      26 |   99 |           |        |     1 |          |      |       |   . |          |                                  |
|    71 |      UNION-ALL                           |                           |         |      |           |        |     1 |          |      |       |   . |          |                                  |
|    72 |       HASH GROUP BY                      |                           |      25 |   89 |        71 |     +4 |     1 |        0 |      |       | 1MB |          |                                  |
|    73 |        VIEW                              | DBA_FREE_SPACE            |     123 |   88 |        71 |     +4 |     1 |    99234 |      |       |   . |          |                                  |
|    74 |         UNION-ALL                        |                           |         |      |        71 |     +4 |     1 |    99234 |      |       |   . |          |                                  |
|    75 |          NESTED LOOPS                    |                           |       1 |    4 |           |        |     1 |          |      |       |   . |          |                                  |
|    76 |           NESTED LOOPS                   |                           |       1 |    4 |           |        |     1 |          |      |       |   . |          |                                  |
|    77 |            TABLE ACCESS FULL             | FET$                      |       1 |    4 |           |        |     1 |          |      |       |   . |          |                                  |
|    78 |            TABLE ACCESS CLUSTER          | TS$                       |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    79 |             INDEX UNIQUE SCAN            | I_TS#                     |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    80 |           INDEX UNIQUE SCAN              | I_FILE2                   |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|    81 |          NESTED LOOPS                    |                           |      31 |    9 |         3 |     +4 |     1 |    96484 |      |       |   . |          |                                  |
|    82 |           NESTED LOOPS                   |                           |      31 |    9 |         3 |     +4 |     1 |    96484 |      |       |   . |          |                                  |
|    83 |            TABLE ACCESS FULL             | TS$                       |      25 |    9 |         3 |     +4 |     1 |       25 |      |       |   . |          |                                  |
|    84 |            FIXED TABLE FIXED INDEX       | X$KTFBFE (ind:1)          |       1 |      |         3 |     +4 |    25 |    96484 |      |       |   . |     1.12 | Cpu (1)                          |
|    85 |           INDEX UNIQUE SCAN              | I_FILE2                   |       1 |      |         3 |     +4 | 96484 |    96484 |      |       |   . |          |                                  |
|    86 |          NESTED LOOPS                    |                           |      89 |   60 |        67 |     +8 |     1 |     2750 |      |       |   . |          |                                  |
| -> 87 |           HASH JOIN                      |                           |     639 |   60 |        85 |     +6 |     1 |     2750 |      |       | 1MB |     1.12 | Cpu (1)                          |
|    88 |            NESTED LOOPS                  |                           |      56 |   39 |         1 |     +6 |     1 |       57 |      |       |   . |          |                                  |
|    89 |             NESTED LOOPS                 |                           |     275 |   39 |         1 |     +6 |     1 |       57 |      |       |   . |          |                                  |
|    90 |              TABLE ACCESS FULL           | TS$                       |      25 |    9 |         1 |     +6 |     1 |       25 |      |       |   . |          |                                  |
|    91 |              INDEX RANGE SCAN            | RECYCLEBIN$_TS            |      11 |    1 |         1 |     +6 |    25 |       57 |      |       |   . |          |                                  |
|    92 |             TABLE ACCESS BY INDEX ROWID  | RECYCLEBIN$               |       2 |    2 |         1 |     +6 |    57 |       57 |      |       |   . |          |                                  |
| -> 93 |            FIXED TABLE FULL              | X$KTFBUE                  |    100K |   20 |        86 |     +5 |     1 |       8M | 163K |   1GB |   . |    91.01 | gc cr disk read (12)             |
|       |                                          |                           |         |      |           |        |       |          |      |       |     |          | Cpu (15)                         |
|       |                                          |                           |         |      |           |        |       |          |      |       |     |          | db file sequential read (54)     |
| -> 94 |           INDEX UNIQUE SCAN              | I_FILE2                   |       1 |      |        83 |     +8 |  2750 |     2750 |      |       |   . |          |                                  |
|    95 |          NESTED LOOPS                    |                           |       1 |   12 |           |        |       |          |      |       |   . |          |                                  |
|    96 |           NESTED LOOPS                   |                           |      11 |   12 |           |        |       |          |      |       |   . |          |                                  |
|    97 |            NESTED LOOPS                  |                           |       1 |   10 |           |        |       |          |      |       |   . |          |                                  |
|    98 |             NESTED LOOPS                 |                           |       1 |   10 |           |        |       |          |      |       |   . |          |                                  |
|    99 |              TABLE ACCESS FULL           | TS$                       |       1 |    9 |           |        |       |          |      |       |   . |          |                                  |
|   100 |              TABLE ACCESS CLUSTER        | UET$                      |       1 |    1 |           |        |       |          |      |       |   . |          |                                  |
|   101 |               INDEX RANGE SCAN           | I_FILE#_BLOCK#            |       1 |    1 |           |        |       |          |      |       |   . |          |                                  |
|   102 |             INDEX UNIQUE SCAN            | I_FILE2                   |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|   103 |            INDEX RANGE SCAN              | RECYCLEBIN$_TS            |      11 |    1 |           |        |       |          |      |       |   . |          |                                  |
|   104 |           TABLE ACCESS BY INDEX ROWID    | RECYCLEBIN$               |       1 |    2 |           |        |       |          |      |       |   . |          |                                  |
|   105 |          NESTED LOOPS                    |                           |       1 |    3 |           |        |       |          |      |       |   . |          |                                  |
|   106 |           NESTED LOOPS                   |                           |       1 |    2 |           |        |       |          |      |       |   . |          |                                  |
|   107 |            TABLE ACCESS FULL             | NEW_LOST_WRITE_EXTENTS$   |       1 |    2 |           |        |       |          |      |       |   . |          |                                  |
|   108 |            TABLE ACCESS CLUSTER          | TS$                       |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|   109 |             INDEX UNIQUE SCAN            | I_TS#                     |       1 |      |           |        |       |          |      |       |   . |          |                                  |
|   110 |           INDEX RANGE SCAN               | I_FILE2                   |       1 |    1 |           |        |       |          |      |       |   . |          |                                  |
|   111 |       HASH GROUP BY                      |                           |       1 |   10 |           |        |       |          |      |       |   . |          |                                  |
|   112 |        NESTED LOOPS                      |                           |       1 |    9 |           |        |       |          |      |       |   . |          |                                  |
|   113 |         TABLE ACCESS FULL                | TS$                       |       1 |    9 |           |        |       |          |      |       |   . |          |                                  |
|   114 |         FIXED TABLE FIXED INDEX          | X$KTFTHC (ind:2)          |       1 |      |           |        |       |          |      |       |   . |          |                                  |
============================================================================================================================================================================================================

Note:
从执行计划看显示是#93 占用了大部分的时间, 使用的是全表扫X$KTFBUE , 估算是是100K,实际当时已经8M, 相差了80倍, 对于x$ktfbue表当没有统计信息时,默认的统计信息应该是100,000 rows.

View:   X$KTFBUE

Desc.:     [K]ernel [T]ablespace [F]ile [B]itmapped     [U]sed [E]xtents

查看该FIXED TABLE是否有统计信息

SQL> col owner for a30
SQL> col table_name for a30
SQL> select owner,table_name,object_type,NUM_ROWS,BLOCKS,LAST_ANALYZED from dba_tab_statistics where  table_name='X$KTFBUE';

OWNER                          TABLE_NAME                     OBJECT_TYPE    NUM_ROWS     BLOCKS LAST_ANAL
------------------------------ ------------------------------ ------------ ---------- ---------- ---------
SYS                            X$KTFBUE                       FIXED TABLE

Note:
可以看到X$ktfbue无统计信息,下面尝试使用GATHER_FIXED_OBJECTS_STATS。

SQL>  exec DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;

PL/SQL procedure successfully completed.

Elapsed: 00:01:53.92
SQL> select owner,table_name,object_type,NUM_ROWS,BLOCKS,LAST_ANALYZED from dba_tab_statistics where  table_name='X$KTFBUE';

OWNER                          TABLE_NAME                     OBJECT_TYPE    NUM_ROWS     BLOCKS LAST_ANAL
------------------------------ ------------------------------ ------------ ---------- ---------- ---------
SYS                            X$KTFBUE                       FIXED TABLE

SQL> SELECT count(*),count(last_analyzed),sum(decode(last_analyzed,null,1,0)) FROM DBA_TAB_STATISTICS where OBJECT_TYPE='FIXED TABLE';

  COUNT(*) COUNT(LAST_ANALYZED) SUM(DECODE(LAST_ANALYZED,NULL,1,0))
---------- -------------------- -----------------------------------
      1335                 1180                                 155

Note:
其实可以看到使用DBMS_STATS.GATHER_FIXED_OBJECTS_STATS收集成功后,仍旧有很多FIXED TABLE无统计信息,包含本次出错的X$KTFBUE, 是否非常有趣?其实在MOS note ID 1355608.1 中有记录, 这些FIXED TABLE没有收集统计的原因是因为开发人员在oracle的代码级标注,忽略这些table的统计信息收集,因为他们认为对于一些FIXED TABLE不收集统计信息会更好。

这时如果想收集fixed table统计信息的方法是

SQL> EXEC DBMS_STATS.gather_table_stats('SYS','X$KTFBUE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.70
SQL> select owner,table_name,object_type,NUM_ROWS,BLOCKS,LAST_ANALYZED from dba_tab_statistics where  table_name='X$KTFBUE';

OWNER                          TABLE_NAME                     OBJECT_TYPE    NUM_ROWS     BLOCKS LAST_ANAL
------------------------------ ------------------------------ ------------ ---------- ---------- ---------
SYS                            X$KTFBUE                       FIXED TABLE        3839            17-MAR-19

SQL> @df

TABLESPACE_NAME                   TotalMB     UsedMB     FreeMB % Used Ext Used
------------------------------ ---------- ---------- ---------- ------ --- ----------------------
DBFSTS                              30720          1      30719     1% NO  |#                   |
...
26 rows selected.

Elapsed: 00:00:04.13

-- execute plan --
|  86 |         NESTED LOOPS                          |                           |     8 |   576 |    39   (0)| 00:00:01 |        |      |            |
|  87 |          NESTED LOOPS                         |                           |    56 |  3640 |    39   (0)| 00:00:01 |        |      |            |
|  88 |           NESTED LOOPS                        |                           |    56 |  2464 |    39   (0)| 00:00:01 |        |      |            |
|* 89 |            TABLE ACCESS FULL                  | TS$                       |    25 |   775 |     9   (0)| 00:00:01 |        |      |            |
|  90 |            TABLE ACCESS BY INDEX ROWID BATCHED| RECYCLEBIN$               |     2 |    26 |     2   (0)| 00:00:01 |        |      |            |
|* 91 |             INDEX RANGE SCAN                  | RECYCLEBIN$_TS            |    11 |       |     1   (0)| 00:00:01 |        |      |            |
|* 92 |           FIXED TABLE FIXED INDEX             | X$KTFBUE (ind:1)          |     1 |    21 |     0   (0)| 00:00:01 |        |      |            |
|* 93 |          INDEX UNIQUE SCAN                    | I_FILE2                   |     1 |     7 |     0   (0)| 00:00:01 |        |      |            |

Note:
在收集X$KTFBUE的统计信息以后,, 执行计划有原来的FIXED FULL TABLE变成了FIXED TABLE FIXED INDEX,现在4秒钟就可以返回数据。问题得到解决,其实在11G r2时当查询dba_extents也会基于这个TABLE同样有可能面对这个问题,使用DBMS_STATS.gather_table_stats(‘SYS’,’X$KTFBUE’)收集这类被忽略的FIXED TABLE.

Troubleshooting Internal error ora-600 [kxspoac : EXL 1] after enable 10503 event

$
0
0

简单记录一下这个内部错误 ora-600 [kxspoac : EXL 1], 环境11.2.0.3 RAC on hpux ia, 这是一个并行查询相关的错误,当启了10503 event后并行查询带绑定变量的SQL时有可能会出现此错误。

adrci> show incident -all
1152791              ORA 600 [kxspoac : EXL 1]                                   2019-03-24 23:24:41.335000 +08:00       
1152792              ORA 600 [kxspoac : EXL 1]                                   2019-03-24 23:54:49.425000 +08:00       
1152793              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 00:56:05.332000 +08:00       
1152794              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 01:16:09.069000 +08:00       
1152795              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 02:36:20.385000 +08:00       
1152796              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 02:46:22.276000 +08:00       
1152797              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 02:56:24.033000 +08:00       
1200492              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 03:06:25.915000 +08:00       
1159670              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 03:16:27.571000 +08:00       
1159671              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 05:46:56.464000 +08:00       
1159672              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 05:56:58.675000 +08:00       
1159673              ORA 600 [kxspoac : EXL 1]                                   2019-03-25 06:08:01.796000 +08:00

adrci> show incident -mode detail -p “incident_id=1159673”

ADR Home = /oracle/app/oracle/diag/rdbms/anbob/anbob1:
*************************************************************************

**********************************************************
INCIDENT INFO RECORD 1
**********************************************************
INCIDENT_ID 1159673
STATUS ready
CREATE_TIME 2019-03-25 06:08:01.796000 +08:00
PROBLEM_ID 8
CLOSE_TIME <NULL>
FLOOD_CONTROLLED none
ERROR_FACILITY ORA
ERROR_NUMBER 600
ERROR_ARG1 kxspoac : EXL 1
ERROR_ARG2 2000
ERROR_ARG3 32
ERROR_ARG4 <NULL>
ERROR_ARG5 <NULL>
ERROR_ARG6 <NULL>
ERROR_ARG7 <NULL>
ERROR_ARG8 <NULL>
ERROR_ARG9 <NULL>
ERROR_ARG10 <NULL>
ERROR_ARG11 <NULL>
ERROR_ARG12 <NULL>
SIGNALLING_COMPONENT cursor
SIGNALLING_SUBCOMPONENT <NULL>
SUSPECT_COMPONENT <NULL>
SUSPECT_SUBCOMPONENT <NULL>
ECID <NULL>
IMPACTS 0
PROBLEM_KEY ORA 600 [kxspoac : EXL 1]
FIRST_INCIDENT 1152270
FIRSTINC_TIME 2019-03-20 09:56:07.223000 +08:00
LAST_INCIDENT 1200518
LASTINC_TIME 2019-03-25 11:18:57.553000 +08:00
IMPACT1 0
IMPACT2 0
IMPACT3 0
IMPACT4 0
KEY_NAME PQ
KEY_VALUE (33571929, 1553465280)
KEY_NAME SID
KEY_VALUE 8225.5665
KEY_NAME ProcId
KEY_VALUE 957.47
KEY_NAME Client ProcId
KEY_VALUE oracle@qdtza1.14635_1
OWNER_ID 1
INCIDENT_FILE /oracle/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_1159673/anbob1_pz99_14635_i1159673.trc
OWNER_ID 1
INCIDENT_FILE /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_pz99_14635.trc
OWNER_ID 1
INCIDENT_FILE /oracle/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_1159673/anbob1_m000_10497_i1159673_a.trc

ADR Home = /oracle/app/oracle/diag/clients/user_oracle/host_454940151_80:
*************************************************************************
0 rows fetched
</ADR_HOME>
<ADR_HOME name=”/oracle/app/oracle/diag/clients/user_oracle/host_454940151_80″>

ADR Home = /oracle/app/oracle/diag/tnslsnr/qdtza1/listener_gj:
*************************************************************************
0 rows fetched

adrci> show trace /oracle/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_1159673/anbob1_pz99_14635_i1159673.trc
Output the results to file: /tmp/utsout_2836_1_2.ado
“/tmp/utsout_2836_1_2.ado” 75083 lines, 5370280 characters
/oracle/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_1159673/anbob1_pz99_14635_i1159673.trc
———————————————————-
LEVEL PAYLOAD
—– ————————————————————————————————————————————————
Dump file /oracle/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_1159673/anbob1_pz99_14635_i1159673.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/11.2.0.3/dbhome_1
System name: HP-UX
Node name: qdtza1
Release: B.11.31
Version: U
Machine: ia64
Instance name: anbob1
Redo thread mounted by this instance: 1
Oracle process number: 957
Unix process pid: 14635, image: oracle@qdtza1 (PZ99)

*** 2019-03-25 06:08:01.800
*** SESSION ID:(8225.5665) 2019-03-25 06:08:01.800
*** CLIENT ID:() 2019-03-25 06:08:01.800
*** SERVICE NAME:(SYS$USERS) 2019-03-25 06:08:01.800
*** MODULE NAME:() 2019-03-25 06:08:01.800
*** ACTION NAME:() 2019-03-25 06:08:01.800

Dump continued from file: /oracle/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_pz99_14635.trc
1> ***** Error Stack *****
ORA-00600: internal error code, arguments: [kxspoac : EXL 1], [2000], [32], [], [], [], [], [], [], [], [], []
1< ***** Error Stack *****
1> ***** Dump for incident 1159673 (ORA 600 [kxspoac : EXL 1]) *****

*** 2019-03-25 06:08:01.803
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
2> ***** Current SQL Statement for this session (sql_id=6mvfay19q3v4n) *****
SELECT COUNT(CLIENT_INFO) FROM GV$SESSION WHERE USERNAME=:B2 AND CLIENT_INFO = :B1 GROUP BY CLIENT_INFO
2< ***** current_sql_statement *****

2> ***** Call Stack Trace *****

2< ***** call_stack_dump *****

skdstdst <- ksedst <- dbkedDefDump <- ksedmp <- ksfdmp <-
$cold_dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <-
dbgePostErrorKGE <- dbkePostKGE_kgsf <-
kgeadse <- kgerinv_internal <- kgerinv <- kgeasnmierr <-
$cold_kxspoac <- opibnd0 <- opibnd <- kpoalbdf <- kpoal8 <-
opiodr <- kpoodr <- upirtrc <- kpurcsc <- kpuexec <-
OCIStmtExecute <- Error from uwx_step Stack is not Unwindab <- kxfxsStmtExecute <-
kxfxsExecute <- kxfxsp <- kxfxmai <-
***** call_stack_dump <-

从MOS中不难查出Note #810194.1 与Bug 5045992 相关,但MOS中记录的是10.2.0.4 PSU后10503的行为忽略了number的变量的长度时会发生这个错误, 但看来我们sql中 GV$SESSION中username和client_info同为varchar也会出现该错误。

解决方法:

remove 10503 event and restart instance.

10503 EVENT

在之前的版本中为了修正Bug 2450264,提高sql 游标的共享性,引入了event 10503,  EVENT 10503使用户能够指定字符绑定缓冲区长度,  然后使SQL cursor中使用相同的字符变量长度, 跳过字符变量不同刻度区间,可以使子游标chain相对较小, 有助力缓解sql cursor刻度共享问题, 因为默认字符绑定变量是分32, 128, 2000 和4000 bytes4个刻度,当多列在不同的刻度区间时就原来有可能产生倍数级的sql cursor. 之前的案例中提到过SQL ordered by Version Count and Troubleshooting.

看一下案例

SQL> @sw "select sid from v$session where event like 'library%'"

    SID STATE   EVENT                                          SEQ# SEC_IN_WAIT P1                  P2                  P3                  P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------- ------------------- ------------------- ------------------------------------------
   8183 WAITING library cache: mutex X                          480         106 idn=                value=              where= 19
                                                                                0x000000003EB706D9  0x0000012800000000

   8463 WAITING library cache: mutex X                          334         154 idn=                value=              where= 85
                                                                                0x000000003EB706D9  0x0000012800000000

   3121 WAITING library cache: mutex X                          289         154 idn=                value=              where= 85
                                                                                0x000000003EB706D9  0x0000012800000000


SQL> select BLOCKING_INSTANCE,BLOCKING_SESSION_STATUS,BLOCKING_SESSION from v$session where sid=8183;

BLOCKING_INSTANCE BLOCKING_SE BLOCKING_SESSION
----------------- ----------- ----------------
                2 VALID                    296

SQL> @s 296

    SID SQLID_AND_CHILD      STATUS   STATE   EVENT                                          SEQ# SEC_IN_WAIT BLOCKING_SID P1                 P2                 P3                 P1TRANSL
------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ ------------------ ------------------ ------------------------------------------
    296 4uqp4jczbf1qt        ACTIVE   WORKING On CPU / runqueue                               331         171 NOT IN WAIT

SQL> select sql_fulltext from v$sqlarea where sql_id='&sqlid'; 
Enter value for sqlid: 4uqp4jczbf1qt

SQL_FULLTEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
insert into TXXXXX (
        LOGINTICKET,
        MARKETINGOID,
        ISEMERGENCY,
        BUSICHANCEORDERID,
        PARENTID,
        ORDERMODE,
        ORDERID,
        SUBRECDEFID,
        MARKETID,
        PICTUREID,
        STATUSDATE,
        STATUS,
        NOTES,
        DISCOUNT,
        RECFEE,
        DELEGATEID,
        DELEGATEADDR,
        CONTACTPHONE,
        DELEGATEIDTYPE,
        DELEGATENAME,
        IPADDRESS,
        MACADDR,
        LINKREGION,
        LINKTYPE,
        RELATERECOID,
        ISROLLBACK,
        ISNOTIFY,
        ISBACKPROCESS,
        RECDATE,
        RECOPID,
        RECORGID,
        OWNERORGID,
        PRODID,
        VERFITYTYPE,
        RECDEFID,
        ACCESSNUMBER,
        SERVNUMBER,
        CONTACTTYPE,
        CUSTNAME,
        CUSTID,
        ENTITYID,
        ENTITYTYPE,
        FORMNUM,
        GROUPID,
        REGION,
        OID,
        CONTACTOP,
        ISDELEGATEREC
)
values (
        :V0,
        :V1,
        :V2,
        :V3,
        :V4,
        :V5,
        :V6,
        :V7,
        :V8,
        :V9,
        :V10,
        :V11,
        :V12,
        :V13,
        :V14,
        :V15,
        :V16,
        :V17,
        :V18,
        :V19,
        :V20,
        :V21,
        :V22,
        :V23,
        :V24,
        :V25,
        :V26,
        :V27,
        :V28,
        :V29,
        :V30,
        :V31,
        :V32,
        :V33,
        :V34,
        :V35,
        :V36,
        :V37,
        :V38,
        :V39,
        :V40,
        :V41,
        :V42,
        :V43,
        :V44,
        :V45,
        :V46,
        :V47
);																				
			


SQL>  select count(*) from v$sql where sql_id='4uqp4jczbf1qt';

  COUNT(*)
----------
     21372

SQL> select version_count from v$sqlarea where sql_id='4uqp4jczbf1qt';

VERSION_COUNT
-------------
           74

SQL> @no_shared
Enter value for sqlid: 4uqp4jczbf1qt

SQL_ID        NONSHARED_REASON                COUNT(*)
------------- ----------------------------- ----------
4uqp4jczbf1qt BIND_MISMATCH                      21428
4uqp4jczbf1qt HASH_MATCH_FAILED                    290
4uqp4jczbf1qt BIND_LENGTH_UPGRADEABLE            17489
		   
SQL> select /*+rule*/ m.position,m.bind_name , m.max_length,count(*) child_cursor_count
     from v$sql s, v$sql_bind_metadata m
     where s.sql_id =  '4uqp4jczbf1qt'
     and m.datatype=1
     and s.child_address = m.address group by m.position,m.bind_name , m.max_length
     order by 1, 2;                
                   

  POSITION BIND_NAME                      MAX_LENGTH CHILD_CURSOR_COUNT
---------- ------------------------------ ---------- ------------------
         1 V0                                     32              21668
         2 V1                                     32              21668
         3 V2                                     22              21668
         4 V3                                     22              21668
         5 V4                                     22              21668
         6 V5                                     32              21668
         7 V6                                     32              21668
         8 V7                                     32              21668
         9 V8                                     32              21668
        10 V9                                     32              21668
        11 V10                                     7              21668
        12 V11                                    32              21668
        13 V12                                    32              18251
        13 V12                                   128               3105
        13 V12                                  2000                312
        14 V13                                    22              21668
        15 V14                                    22              21668
        16 V15                                    32              21668
        17 V16                                    32              21456
        17 V16                                   128                212
        18 V17                                    32              21668
        19 V18                                    32              21668
        20 V19                                    32              21668
        21 V20                                    32              21668
        22 V21                                    32              21668
        23 V22                                    22              21668
        24 V23                                    32              21668
        25 V24                                    22              21668
        26 V25                                    22              21668
        27 V26                                    22              21668
        28 V27                                    22              21668
        29 V28                                     7              21668
        30 V29                                    32              21668
        31 V30                                    32              21668
        32 V31                                    32              21668
        33 V32                                    32              21668
        34 V33                                    32              21668
        35 V34                                    32              21668
        36 V35                                    32              21668
        37 V36                                    32              21668
        38 V37                                    32              21668
        39 V38                                    32              19563
        39 V38                                   128               2105
        40 V39                                    22              21668
        41 V40                                    22              21668
        42 V41                                    32              21668
        43 V42                                    32              21668
        44 V43                                    22              21668
        45 V44                                    22              21668
        46 V45                                    22              21668
        47 V46                                    32              21668
        48 V47                                    22              21668

52 rows selected.

# enable 10503 level 2000 [max length]
SQL> alter system set events='10503 trace name context forever, level 2000';
System altered.

SQL> oradebug setmypid
Statement processed.
SQL> oradebug eventdump system
10503 trace name context forever, level 2000
10949 trace name context forever,level 1
28401 trace name context forever,level 1

SQL> alter system flush shared_pool;
System altered.

SQL>  select count(*) from v$sql where sql_id='4uqp4jczbf1qt';
  COUNT(*)
----------
         0
SQL> select /*+rule*/ m.position,m.bind_name , m.max_length,count(*) child_cursor_count
  2       from v$sql s, v$sql_bind_metadata m
  3       where s.sql_id =  '4uqp4jczbf1qt'
  4       and m.datatype=1
  5       and s.child_address = m.address group by m.position,m.bind_name , m.max_length
  6       order by 1, 2;     

  POSITION BIND_NAME                      MAX_LENGTH CHILD_CURSOR_COUNT
---------- ------------------------------ ---------- ------------------
         1 V0                                   2000                 86
         2 V1                                   2000                 86
         7 V6                                   2000                 86
         8 V7                                   2000                 86
         9 V8                                   2000                 86
        10 V9                                   2000                 86
        12 V11                                  2000                 86
        13 V12                                  2000                 86
        16 V15                                  2000                 86
        17 V16                                  2000                 86
        18 V17                                  2000                 86
        19 V18                                  2000                 86
        20 V19                                  2000                 86
        21 V20                                  2000                 86
        22 V21                                  2000                 86
        30 V29                                  2000                 86
        31 V30                                  2000                 86
        32 V31                                  2000                 86
        33 V32                                  2000                 86
        34 V33                                  2000                 86
        35 V34                                  2000                 86
        36 V35                                  2000                 86
        37 V36                                  2000                 86
        38 V37                                  2000                 86
        39 V38                                  2000                 86
        42 V41                                  2000                 86
        43 V42                                  2000                 86
        47 V46                                  2000                 86

MySQL 5.7 使用diagnostics() Procedure生成”AWR” Report

$
0
0

Oracle Database有强大的AWR报告分析整体的服务器性能问题, 但是MySQL之前是没有的,需要自定义大量的脚本生成监控数据, 从MySQL 5.7 (5.7.9)开始,可以使用sys.diagnostics()存储过程依赖于PERFORMANCE_SCHEMA,生成类似于Oracle AWR一样的MySQL性能报告。 官方文档更多看这里

This procedure disables binary logging during its execution by manipulating the session value of the sql_log_bin system variable. That is a restricted operation, so the procedure requires privileges sufficient to set restricted session variables.

此存储过程是也是利用snapshot快照前的性能视图增量值,生成全局性能报告。

该存储过程有三个重要参数,in_max_runtime、in_interval、in_auto_config

in_max_runtime: 总共最大收集时间,单位秒,null 为默认值60秒;
in_interval:快照间的间隔时间,单位秒,null为默认30秒;
in_auto_config: Performance Schema的选项分析current\medium\full,  启的选项指标越全,对MySQL服务的性能影响越大,FULL的影响最大。

下面我们收集2分钟的一个性能报告,每次间隔30秒,生成本本报告 ,下面是只附上了部分内容 。

mysql> tee diag.out;
mysql> CALL sys.diagnostics(120, 30, 'current');
mysql> notee;

view diag.out

+-------------------------+---------------------------------------------------------+
| Name                    | Value                                                   |
+-------------------------+---------------------------------------------------------+
| Hostname                | localhost.localdomain                                   |
| Port                    | 3306                                                    |
| Socket                  | /tmp/mysql.sock                                         |
| Datadir                 | /usr/local/mysql/data/                                  |
| Server UUID             | 44094390-4fa3-11e9-b3ae-080027963204                    |
| ----------------------- | ------------------------------------------------------- |
| MySQL Version           | 5.7.25-enterprise-commercial-advanced                   |
| Sys Schema Version      | 1.5.1                                                   |
| Version Comment         | MySQL Enterprise Server - Advanced Edition (Commercial) |
| Version Compile OS      | el7                                                     |
| Version Compile Machine | x86_64                                                  |
| ----------------------- | ------------------------------------------------------- |
| UTC Time                | 2019-03-26 14:15:37                                     |
| Local Time              | 2019-03-26 10:15:37                                     |
| Time Zone               | SYSTEM                                                  |
| System Time Zone        | EDT                                                     |
| Time Zone Offset        | -04:00:00                                               |
+-------------------------+---------------------------------------------------------+
17 rows in set (0.02 sec)

| InnoDB |      |
=====================================
2019-03-26 10:17:08 0x7f9e104f0700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 31 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 13 srv_active, 0 srv_shutdown, 2670 srv_idle
srv_master_thread log flush and writes: 2683
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 8
OS WAIT ARRAY INFO: signal count 8
RW-shared spins 0, rounds 14, OS waits 7
RW-excl spins 0, rounds 30, OS waits 1
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 14.00 RW-shared, 30.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 3131
Purge done for trx's n:o < 0 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421792143439696, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
262 OS file reads, 650 OS file writes, 47 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.65 writes/s, 0.06 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
3.35 hash searches/s, 10.19 non-hash searches/s
---
LOG
---
Log sequence number 2525074
Log flushed up to   2525074
Pages flushed up to 2525074
Last checkpoint at  2525065
0 pending log flushes, 0 pending chkp writes
30 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 266936
Buffer pool size   8191
Free buffers       7779
Database pages     404
Old database pages 0
Modified db pages  19
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 231, created 173, written 603
0.00 reads/s, 0.00 creates/s, 0.61 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 404, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=3001, Main thread ID=140316866721536, state: sleeping
Number of rows inserted 15898, updated 0, deleted 0, read 18467
42.61 inserts/s, 0.00 updates/s, 0.00 deletes/s, 43.48 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

+--------------------------+
| The following output is: |
+--------------------------+
| InnoDB - Transactions    |
+--------------------------+
1 row in set (1 min 31.40 sec)

Empty set (1 min 31.40 sec)

+-------------------------------+
| The following output is:      |
+-------------------------------+
| SELECT * FROM sys.processlist |
+-------------------------------+
1 row in set (1 min 31.40 sec)

+---------------------------------------------------+
| The following output is:                          |
+---------------------------------------------------+
| SELECT * FROM sys.memory_by_host_by_current_bytes |
+---------------------------------------------------+
1 row in set (1 min 31.50 sec)

+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| host       | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| background |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
| localhost  |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
2 rows in set (1 min 31.50 sec)

+-----------------------------------------------------+
| The following output is:                            |
+-----------------------------------------------------+
| SELECT * FROM sys.memory_by_thread_by_current_bytes |
+-----------------------------------------------------+
1 row in set (1 min 31.50 sec)

+-----------+---------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| thread_id | user                            | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+-----------+---------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
|        16 | innodb/srv_worker_thread        |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        17 | innodb/srv_error_monitor_thread |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        18 | innodb/srv_monitor_thread       |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        19 | innodb/srv_master_thread        |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        20 | innodb/srv_worker_thread        |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        21 | innodb/srv_purge_thread         |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        22 | innodb/srv_lock_timeout_thread  |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        23 | innodb/dict_stats_thread        |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        24 | innodb/buf_dump_thread          |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        25 | sql/signal_handler              |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        26 | sql/compress_gtid_table         |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        30 | root@localhost                  |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         1 | sql/main                        |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         2 | sql/thread_timer_notifier       |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         3 | innodb/io_ibuf_thread           |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         4 | innodb/io_log_thread            |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         5 | innodb/io_read_thread           |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         6 | innodb/io_read_thread           |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         7 | innodb/io_read_thread           |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         8 | innodb/io_read_thread           |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|         9 | innodb/io_write_thread          |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        10 | innodb/io_write_thread          |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        11 | innodb/io_write_thread          |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        12 | innodb/io_write_thread          |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        13 | innodb/page_cleaner_thread      |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
|        15 | innodb/srv_worker_thread        |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
+-----------+---------------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+
26 rows in set (1 min 31.55 sec)

+---------------------------------------------------+
| The following output is:                          |
+---------------------------------------------------+
| SELECT * FROM sys.memory_by_user_by_current_bytes |
+---------------------------------------------------+
1 row in set (1 min 31.55 sec)

+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| user       | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated |
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
| root       |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
| background |                  0 | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         |
+------------+--------------------+-------------------+-------------------+-------------------+-----------------+
2 rows in set (1 min 31.56 sec)

+---------------------------------------+
| The following output is:              |
+---------------------------------------+
| SHOW ENGINE PERFORMANCE_SCHEMA STATUS |
+---------------------------------------+
1 row in set (1 min 31.58 sec)

+--------------------+-------------------------------------------------------------+-----------+
| Type               | Name                                                        | Status    |
+--------------------+-------------------------------------------------------------+-----------+
| performance_schema | events_waits_current.size                                   | 176       |
| performance_schema | events_waits_current.count                                  | 1536      |
| performance_schema | events_waits_history.size                                   | 176       |
| performance_schema | events_waits_history.count                                  | 2560      |
| performance_schema | events_waits_history.memory                                 | 450560    |
| performance_schema | events_waits_history_long.size                              | 176       |
| performance_schema | events_waits_history_long.count                             | 10000     |
| performance_schema | events_waits_history_long.memory                            | 1760000   |
| performance_schema | (pfs_mutex_class).size                                      | 256       |

...
+-----------------------------------------------+
| The following output is:                      |
+-----------------------------------------------+
| CALL sys.ps_statement_avg_latency_histogram() |
+-----------------------------------------------+
1 row in set (1 min 31.58 sec)

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Performance Schema Statement Digest Average Latency Histogram                                                                                                                                                                                                                                                                                                                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|

  . = 1 unit
  * = 2 units
  # = 3 units

(0 - 3831ms)      2 | .
(3831 - 7662ms)   0 |
(7662 - 11494ms)  0 |
(11494 - 15325ms) 0 |
(15325 - 19156ms) 0 |
(19156 - 22987ms) 0 |
(22987 - 26819ms) 0 |
(26819 - 30650ms) 0 |
(30650 - 34481ms) 0 |
(34481 - 38312ms) 0 |
(38312 - 42144ms) 0 |
(42144 - 45975ms) 0 |
(45975 - 49806ms) 0 |
(49806 - 53637ms) 0 |
(53637 - 57469ms) 0 |
(57469 - 61300ms) 0 |


+-------------------------------+
| The following output is:      |
+-------------------------------+
| Delta io_by_thread_by_latency |
+-------------------------------+
1 row in set (1 min 31.72 sec)

+---------------------+-------+---------------+-------------+-------------+-------------+-----------+----------------+
| user                | total | total_latency | min_latency | avg_latency | max_latency | thread_id | processlist_id |
+---------------------+-------+---------------+-------------+-------------+-------------+-----------+----------------+
| page_cleaner_thread |   625 | 230.51 ms     | 5.02 us     | 368.82 us   | 86.48 ms    |        13 |           NULL |
| main                |  1715 | 107.05 ms     | 364.34 ns   | 62.42 us    | 78.26 ms    |         1 |           NULL |
| io_write_thread     |    11 | 75.88 ms      | 3.72 ms     | 6.90 ms     | 34.03 ms    |         9 |           NULL |
| srv_master_thread   |    20 | 47.61 ms      | 44.57 us    | 2.38 ms     | 8.08 ms     |        19 |           NULL |
| io_log_thread       |     7 | 30.85 ms      | 3.79 ms     | 4.41 ms     | 5.74 ms     |         4 |           NULL |
| buf_dump_thread     |   108 | 2.00 ms       | 1.89 us     | 18.56 us    | 115.23 us   |        24 |           NULL |
+---------------------+-------+---------------+-------------+-------------+-------------+-----------+----------------+
6 rows in set (1 min 31.72 sec)

+-------------------------------+
| The following output is:      |
+-------------------------------+
| Delta waits_global_by_latency |
+-------------------------------+
1 row in set (1 min 31.81 sec)

+--------------------------------------+-------+---------------+-------------+-------------+
| events                               | total | total_latency | avg_latency | max_latency |
+--------------------------------------+-------+---------------+-------------+-------------+
| wait/io/file/innodb/innodb_data_file |   131 | 114.58 ms     | 874.65 us   | 86.48 ms    |
| wait/io/file/innodb/innodb_log_file  |     4 | 9.15 ms       | 2.29 ms     | 8.08 ms     |
+--------------------------------------+-------+---------------+-------------+-------------+
2 rows in set (1 min 31.81 sec)

+------------------------------------------+
| The following output is:                 |
+------------------------------------------+
| Delta wait_classes_global_by_avg_latency |
+------------------------------------------+
1 row in set (1 min 31.81 sec)

+--------------+-------+---------------+-------------+-------------+-------------+
| event_class  | total | total_latency | min_latency | avg_latency | max_latency |
+--------------+-------+---------------+-------------+-------------+-------------+
| wait/io/file |   135 | 123.73 ms     | 0 ps        | 916.52 us   | 86.48 ms    |
+--------------+-------+---------------+-------------+-------------+-------------+
1 row in set (1 min 31.81 sec)

+--------------------------------------+
| The following output is:             |
+--------------------------------------+
| Delta wait_classes_global_by_latency |
+--------------------------------------+
1 row in set (1 min 31.81 sec)

+--------------+-------+---------------+-------------+-------------+-------------+
| event_class  | total | total_latency | min_latency | avg_latency | max_latency |
+--------------+-------+---------------+-------------+-------------+-------------+
| wait/io/file |   135 | 123.73 ms     | 0 ps        | 916.52 us   | 86.48 ms    |
+--------------+-------+---------------+-------------+-------------+-------------+

+---------------------------+
| The following output is:  |
+---------------------------+
| SELECT * FROM sys.metrics |
+---------------------------+
1 row in set (1 min 31.81 sec)

...

使用-H先选项可以生成html页面,不过没有样式,非常丑,确实是ORACLE RDBMS 还差了好几条街。

mysql -u root -p -H -e"CALL sys.diagnostics(120, 30, 'current');" > ./current_instance_report.html

这数据再配合上OS crontab就可以实现类似AWR的功能了

0 * * * *  mysql -u root -H  -e"CALL sys.diagnostics(3600, 1800, 'current');" > /home/mysql/awr/instance_report_$(date +"%Y-%m-%d_%H-%M")

references https://dev.mysql.com/doc/refman/5.7/en/sys-diagnostics.html & Mahmoud Hatem’s Archive

Oracle12c R2注意事项: Active DataGuard logon fail with ORA-00604& ORA-04024

$
0
0

这是一套12c  R2 4-nodes Oracle RAC on RHEL 7的环境,已安装0417 RU。 该库有一套Phyical DataGard, 同时也是GoldenGate的target端,存在一个replicat 进程同步数据。 一日收到该数据库归档空间(in ASM) DiskgrouP 使用率告警,后分析刚上线没多久就趟了一个雷。这里简单记录过程。

— DB alert log

2019-04-12 17:31:43.791000 +08:00
krsd_check_stuck_arch: stuck archiver condition cleared
Unable to create archive log file '+ARCHDG'
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_arc1_64728.trc:
ORA-19504: failed to create file "+ARCHDG"
ORA-17502: ksfdcre:4 Failed to create file +ARCHDG
ORA-15041: diskgroup "ARCHDG" space exhausted
ARC1: Error 19504 Creating archive log file to '+ARCHDG'
krsd_check_stuck_arch: stuck archiver: insufficient local LADs
krsd_check_stuck_arch: stuck archiver condition declared

— 手动清理日志

RMAN> list archive log all;
...
RMAN> delete archivelog until time "sysdate-1";

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=7428 instance=anbob2 device type=DISK
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1887.326.1005219643 thread=1 sequence=1887
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1888.277.1005226879 thread=1 sequence=1888
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1889.324.1005230519 thread=1 sequence=1889
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1890.302.1005232111 thread=1 sequence=1890
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1891.269.1005233425 thread=1 sequence=1891
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1892.259.1005234787 thread=1 sequence=1892
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1893.332.1005237411 thread=1 sequence=1893
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
archived log file name=+ARCHDG/anbob/ARCHIVELOG/2019_04_10/thread_1_seq_1894.341.1005240287 thread=1 sequence=1894

TIP:
当然该库有DG,目录的日志还未同步应用,虽然强制(with force option)删除影响了DG环境,后期还要补救过于麻烦,当然主库恢复业务优先, 确认了一下ASM DISKGROUP可用空间和归档日志日生成量,决定先临时修改归档路径到DATA所在的ASM DISKGROUP。

SQL> @asmdg

GROUP_NUMBER NAME                           SECTOR_SIZE LOGICAL_SECTOR_SIZE BLOCK_SIZE ALLOCATION_UNIT_SIZE STATE       TYPE     TOTAL_MB    FREE_MB HOT_USED_MB COLD_USED_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB OFFLINE_DISKS COMPATIBILITY                                                DATABASE_COMPATIBILITY                                       V     CON_ID
------------ ------------------------------ ----------- ------------------- ---------- -------------------- ----------- ------ ---------- ---------- ----------- ------------ ----------------------- -------------- ------------- ------------------------------------------------------------ ------------------------------------------------------------ - ----------
           1 ARCHDG                                 512                 512       4096              4194304 CONNECTED   EXTERN    1048576        384           0      1048192                       0            384             0 12.2.0.1.0                                                   10.1.0.0.0                                                   N          0
           2 DATADG                                 512                 512       4096              4194304 CONNECTED   EXTERN    9437184    1141244           0      8295940                       0        1141244             0 12.2.0.1.0                                                   10.1.0.0.0                                                   N          0
           3 MGMTDG                                 512                 512       4096              4194304 MOUNTED     EXTERN     102400      66152           0        36248                       0          66152             0 12.2.0.1.0                                                   10.1.0.0.0                                                   N          0
           4 OCRDG                                  512                 512       4096              4194304 MOUNTED     NORMAL      10240       9084           0         1156                    2048           3518             0 12.2.0.1.0                                                   10.1.0.0.0                                                   Y          0

SQL> @logswith
  THREAD# date        Day total  h00  h01  h02  h03  h04  h05  h06  h07  h08  h09  h10  h11  h12  h13  h14  h15  h16  h17  h18  h19  h20  h21  h22  h23
---------- ----------- --- ----- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
         1 02 APR 2019 Tue    30    2    0    0    2    0    2    0    2    2    2    0    2    0    0    2    0   10    0    2    0    0    2    0    0
         2 02 APR 2019 Tue    44    2    4    0    0    0    0    2    2    2    2    2    2    0    2    4    2    8    2    2    2    2    2    0    0
         3 02 APR 2019 Tue    36    2    2    0    0    0    2    0    2    2    2    0    2    0    2    2    2   10    2    0    2    0    2    0    0
         4 02 APR 2019 Tue    56    2    4    0    2    0    2    2    2    6    6    4    2    0    2    2    4    8    2    2    2    0    2    0    0
         1 03 APR 2019 Wed    24    0    0    0    0    0    4    2    0    2    2    2    2    2    0    2    2    0    2    2    0    0    0    0    0
         2 03 APR 2019 Wed    42    0    0    0    0    0    0    2    0    2    4    4    4    4    4    2    4    2    4    2    2    0    2    0    0
         3 03 APR 2019 Wed    36    0    0    0    0    0    0    2    0    2    4    2    4    4    2    4    2    2    4    2    0    2    0    0    0
         4 03 APR 2019 Wed    50    2    0    0    0    2    0    2    2    6    6    6    2    4    2    4    2    2    4    0    2    2    0    0    0
         1 04 APR 2019 Thu    22    2    0    0    0    0    4    0    0    2    2    2    2    0    2    0    2    0    2    0    2    0    0    0    0
         2 04 APR 2019 Thu    34    2    0    0    0    0    2    0    0    2    2    2    4    2    2    4    2    2    2    2    0    2    2    0    0
         3 04 APR 2019 Thu    32    2    0    0    0    0    0    2    0    2    2    4    2    2    4    2    2    2    2    0    2    0    0    2    0
         4 04 APR 2019 Thu    48    2    0    2    0    0    2    2    0    6    6    6    4    2    2    2    2    2    2    2    2    0    2    0    0
         1 05 APR 2019 Fri    20    0    2    0    0    0    2    0    2    2    2    2    0    0    2    0    0    2    0    0    2    0    0    2    0
         2 05 APR 2019 Fri    28    0    0    0    0    0    0    2    0    4    2    2    0    2    2    2    0    2    2    0    2    2    2    2    0
         3 05 APR 2019 Fri    20    0    0    0    0    0    0    2    0    4    2    0    2    0    2    0    0    2    2    0    0    2    0    2    0
         4 05 APR 2019 Fri    42    0    2    0    2    0    0    2    2   10    6    2    0    2    2    2    0    2    0    2    0    2    2    0    2
         1 06 APR 2019 Sat    46    0    0    2    2    2    4    4    0    4    4    6    4    4    2    4    2    0    0    2    0    0    0    0    0
         2 06 APR 2019 Sat    82    0    0    6    2    2    4    6    4    2    8   12    8   12    4    2    2    2    2    0    2    0    0    2    0
         3 06 APR 2019 Sat    64    0    0    4    4    2    4    2    2    4    2    4    6    8    8    8    2    0    0    2    0    0    2    0    0
         4 06 APR 2019 Sat    80    0    2    2    2    4    2    4    4    8    4    8   10    6    6    4    8    0    2    2    0    0    2    0    0
         1 07 APR 2019 Sun    14    0    0    0    0    2    2    0    0    2    2    0    2    0    0    0    2    0    0    0    0    0    2    0    0
         2 07 APR 2019 Sun    18    0    0    0    0    0    0    2    0    2    0    2    0    2    0    2    2    0    2    0    2    0    0    2    0
         3 07 APR 2019 Sun    12    0    0    0    0    0    0    2    0    2    0    2    0    0    2    0    2    0    0    2    0    0    0    0    0
         4 07 APR 2019 Sun    32    0    2    0    0    2    0    2    2    4    4    4    2    0    2    0    2    2    0    2    0    0    2    0    0
         1 08 APR 2019 Mon    18    0    0    0    0    2    2    0    0    4    2    2    0    0    2    0    0    2    0    0    0    0    0    2    0
         2 08 APR 2019 Mon    30    0    2    0    0    0    0    2    0    6    2    2    2    0    2    2    2    2    0    2    0    2    0    2    0
         3 08 APR 2019 Mon    18    2    0    2    0    0    0    2    0    2    4    0    0    2    0    0    2    0    0    2    0    0    0    0    0
         4 08 APR 2019 Mon    38    0    2    2    0    2    0    2    2    6    8    4    0    2    0    2    0    2    0    2    0    2    0    0    0
         1 09 APR 2019 Tue    12    0    0    0    0    0    2    0    0    2    2    2    0    0    0    0    2    0    0    0    0    0    2    0    0
         2 09 APR 2019 Tue    18    0    0    0    0    0    0    2    0    2    0    2    2    0    2    0    2    2    0    2    0    0    2    0    0
         3 09 APR 2019 Tue    14    0    2    0    0    0    0    0    2    0    2    2    0    0    2    0    0    0    2    0    0    0    0    2    0
         4 09 APR 2019 Tue    30    2    0    0    0    2    0    2    2    4    4    4    0    2    0    2    0    2    0    2    0    0    2    0    0
         1 10 APR 2019 Wed    34    0    0    0    0    0    4    0    0    4    2    6    2    0    2    2    6    2    2    2    0    0    0    0    0
         2 10 APR 2019 Wed    58    0    0    0    0    0    2    0    0    6    4    6    4    2    2    4   16    6    2    2    0    0    2    0    0
         3 10 APR 2019 Wed    46    0    0    0    0    0    0    0    2    4    2    4    4    2    0    4   10    4    4    2    2    2    0    0    0
         4 10 APR 2019 Wed    66    0    0    2    0    0    0    2    2    8    6    6    6    2    2    6   10    4    4    4    0    0    2    0    0
         1 11 APR 2019 Thu    52    0    0    2    2    0    6    2    2    2    6    6    4    4    2    4    4    2    4    0    0    0    0    0    0
         2 11 APR 2019 Thu   106    0    0    4    2    2    4    4    4    6   12   12   10   10    8    6    6    6    6    0    2    0    0    2    0
         3 11 APR 2019 Thu    90    0    0    4    4    2    2    4    0    8    8   10    8    8    6    8    8    4    4    0    2    0    0    0    0
         4 11 APR 2019 Thu   110    0    2    0    2    4    2    4    2   10   12   10   12    6    6   10    8   10    6    0    2    0    0    2    0
         1 12 APR 2019 Fri    22    0    0    2    0    0    2    0    0    4    2    4    2    0    0    2    2    2    0    0    0    0    0    0    0
         2 12 APR 2019 Fri    36    0    0    0    0    2    0    0    0    4    6    4    4    0    2    4    4    4    2    0    0    0    0    0    0
         3 12 APR 2019 Fri    28    0    0    2    0    0    0    0    0    4    4    4    2    2    0    2    4    4    0    0    0    0    0    0    0
         4 12 APR 2019 Fri    54    0    0    2    0    2    0    2    0   10    8    8    6    0    0    6    4    6    0    0    0    0    0    0    0
                           -----      ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
maximum                      110         4    6    4    4    6    6    4   10   12   12   12   12    8   10   16   10    6    4    2    2    2    2    2

TIP:
当然近两日的归档量也有明显增长。 修改归档路径后主库很快得以恢复, 归档没有清理任务么? 不,我们当然有,还有两套。一套用于删除主备已应用归档,一套用于备库使用率达80%强置清理。 还有主库的使用率告警,当然这次告警没提醒出来问题又出在层层的审批上,不在讨论范围。

主库归档满原因是因为主库归档没删,主库归档没删原因是因为备库没应用,没应用原因是因为没有传到备库, 没传备库原因是因为备库归档也满了,备库的归档满又是什么原因呢? 下一步查看我们的清理脚本日志。

*************************
zzz ***Fri Apr 12 17:00:03 CST 2019
Current archive location usage:100%
Note: Current DG USAGE: 100

Recovery Manager: Release 12.2.0.1.0 - Production on Fri Apr 12 17:00:03 2019

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-00604: error occurred at recursive SQL level 2
ORA-04024: self-deadlock detected while trying to mutex pin cursor 0x21FC81808
The current working directory: /home/oracle
Shell: /home/oracle/sdbo/archivelog_clear.sh
current user: oracle
current ORACLE_SID: anbob1
2019-04-12 17:20:01
Current archivelog mode: Archive
archive destination : +ARCHDG
archive destination type : ASM

Note:
我们的备库清理调度也出现在使用率过高, 归档在ASM中,调用了RMAN清理, 但是rman在连接数据库时提示ORA-604和ORA-4024,手动sqlplus登录备库的node1, 同样提示该错误.

oracle@kdanbob1:/home/oracle>  ora  

SQL*Plus: Release 12.2.0.1.0 Production on Fri Apr 12 18:00:39 2019

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

ERROR at line 23:
ORA-00604: error occurred at recursive SQL level 5
ORA-04024: self-deadlock detected while trying to mutex pin cursor 0x21FC81808

同时登录备库的2,3,4节点提示ORA-01075错误

oracle@kdanbob3:/home/oracle>  ora  

SQL*Plus: Release 12.2.0.1.0 Production on Fri Apr 12 18:07:37 2019
Copyright (c) 1982, 2016, Oracle.  All rights reserved.

ERROR:
ORA-01075: you are currently logged on

Tip:
备库目前无业务,为了恢复备库应用,尝试重启了数据库。节点1可以shutdown ,其它节点通过kill pmon。再次重启恢复正常。

# standby node1 db alert log

2019-04-12T08:28:08.372598+08:00
ARC5: Archiving not possible: error count exceeded
2019-04-12T08:33:08.438192+08:00
Unable to create archive log file '+ARCHDG'
2019-04-12T08:33:08.438508+08:00
Errors in file /oracle/app/oracle/diag/rdbms/stdanbob/anbob1/trace/anbob1_arc1_11549.trc:
ORA-19504: failed to create file "+ARCHDG"
ORA-17502: ksfdcre:4 Failed to create file +ARCHDG
ORA-15041: diskgroup "ARCHDG" space exhausted
ARC1: Error 19504 Creating archive log file to '+ARCHDG'
2019-04-12T08:33:08.455993+08:00
ARC2: Archiving not possible: error count exceeded
2019-04-12T08:33:08.470123+08:00
ARC3: Archiving not possible: error count exceeded
2019-04-12T08:33:52.231688+08:00
Non critical error ORA-48913 caught while writing to trace file "/oracle/app/oracle/diag/rdbms/stdanbob/anbob1/trace/anbob1_w00a_11310.trc"
Error message: ORA-48913: Writing into trace file failed, file size limit [1024000] reached

# trace file anbob1_w00a..
— call stack 已格式化
sed -n ‘/Call Stack Trace/,/call_stack_dump/p’|sed -n ‘5,/*/p’|cut -c1-29|sed ‘/^[[:space:]]*$/d’|awk -F”(” ‘{printf $1 ” <-” } NR%5==0{printf “\n”}’

*** 2019-04-05T11:24:48.744777+08:00
*** SESSION ID:(11237.49015) 2019-04-05T11:24:48.744825+08:00
*** CLIENT ID:() 2019-04-05T11:24:48.744833+08:00
*** SERVICE NAME:(SYS$BACKGROUND) 2019-04-05T11:24:48.744840+08:00
*** MODULE NAME:(KTSJ) 2019-04-05T11:24:48.744848+08:00
*** ACTION NAME:(KTSJ Slave) 2019-04-05T11:24:48.744856+08:00
*** CLIENT DRIVER:() 2019-04-05T11:24:48.744863+08:00

---- Call Stack Trace -----

ksedst <-kxsGetRuntimeLock <kkscsCheckCursor <-
kkscsSearchChildList <-kksfbc <-kkspsc0 <-kksParseCursor <-
<-opiosq0 <-opiall0 <-opikpr <-opiodr <-
rpidrus <-skgmstack <-rpidru <-rpiswu2 <-kprball <-
kqdGetBundledCursor call <- <-kqdobr_new <-kqrReadFromDB <-kqrpre1 <-
kkdlSetTableVersion call <- <-kkdlgstd <-kkmfcbloCbk <-kkmpfcbk <-
qcsprfro <-qcsprfro_tree <-qcsprfro_tree <-qcspafq <-qcspqbDescendents <-
<-qcspqb <-kkmdrv <-opiSem <-opiprs <-
kksParseChildCursor <-rpiswu2 <-kksLoadChild <-kxsGetRuntimeLock <-
<-kksfbc <-kkspsc0 <-kksParseCursor <-
opiosq0 <-opiall0 <-opikpr <-opiodr <-rpidrus <-
skgmstack <-rpidru <-rpiswu2 <-kprball <-kqdGetBundledCursor call <-
<-kqdobr_new <-kqrReadFromDB <-kqrpre1 <-kkdlSetTableVersion call <-
<-kkdlgstd <-kkmfcbloCbk <-kkmpfcbk <-qcsprfro <-
qcsprfro_tree <-qcsprfro_tree <-qcspafq <-qcspqbDescendents <-
qcspqb <-kkmdrv <-opiSem <-opiDeferredSem <-
opitca <-kksFullTypeCheck <-rpiswu2 <-kksLoadChild <-
kxsGetRuntimeLock <-kksfbc <-kkspsc0 <-kksParseCursor <-
<-opiosq0 <-kpooprx <-kpoal8 <-opiodr <-
kpoodrc <-rpiswu2 <-kpoodr <-upirtrc <-kpurcsc <-
kpuexec <-OCIStmtExecute <-ktslj_segmon <-ktslj_lobmon <-ktsj_task_switch <-
<-ktsj_execute_task <-ktsj_slave_main <
ksvrdp_int <-opirip <-opidrv <-sou2o <-opimai_real <-
ssthrdmain <-main <-__libc_start_main <-+245 <-_start <-

Maximum map count configured per process:  65530
===================================================
PROCESS STATE
-------------
Process global information:
     process: 0x3327efe60, call: 0x2169daef0, xact: (nil), curses: 0x353bc5da8, usrses: 0x343c899d0
     in_exception_handler: no
  ----------------------------------------
  SO: 0x3327efe60, type: 2, owner: (nil), flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
   proc=0x3327efe60, name=process, file=ksu.h LINE:15729, pg=0, conuid=0
  (process) Oracle pid:52, ser:205, calls cur/top: 0x2169daef0/0x27ecc5ad8
            flags : (0x2) SYSTEM  icon_uid:0 logon_pdbid=0
            flags2: (0x30),  flags3: (0x10) 
            call error: 0, sess error: 0, txn error 0
            intr queue: empty
    (post info) last post received: 307 0 3
                last post received-location: ksl2.h LINE:4497 ID:kslpsr
                last process to post me: 0x302838628 1 2
                last post sent: 0 0 85
                last post sent-location: kso2.h LINE:1054 ID:ksoreq_reply
                last process posted by me: 0x302838628 1 2
                waiter on post event: 0
    (latch info) hold_bits=0x0 ud_influx=0x0
    (osp latch info) hold_bits=0x0 ud_influx=0x0
    Process Group: DEFAULT, pseudo proc: 0x302cd0920
    O/S info: user: oracle, term: UNKNOWN, ospid: 81006 
    OSD pid info: 
    KGL-UOL (Process state object)
    KGX Atomic Operation Log 0x3327f0b28
     Mutex (nil)(0, 0) idn 0 oper NONE(0)
     FSO mutex uid 65534 efd 0 whr 0 slp 0
...
    KGX Atomic Operation Log 0x3327f0e68
     Mutex (nil)(0, 0) idn 0 oper NONE(0)
     FSO mutex uid 65534 efd 0 whr 0 slp 0
    ----------------------------------------
    SO: 0x2931f29d8, type: 91, owner: 0x3327efe60, flag: INIT/-/-/-/0x00 if: 0x1 c: 0x1
     proc=0x3327efe60, name=KTSJ state object, file=ktsjcts.h LINE:915, pg=0, conuid=0
    KTSJProc Type: Slave - 7
KTSJTASK ptr:0x2a46365e8
tid:1  class:LOB Monitor  status:0x3(READY/RUN/-/-/-/-)
tdata:(nil) tdatal:0
lastrun:0 lastfin:0 currun:0 nextrun:0
con_uid:0 flag:0x0(-)
    ----------------------------------------
    SO: 0x343c899d0, type: 4, owner: 0x3327efe60, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
     proc=0x3327efe60, name=session, file=ksu.h LINE:15737, pg=0, conuid=0
    (session) sid: 11237 ser: 49015 trans: (nil), creator: 0x3327efe60
              flags: (0x51) USR/- flags2: (0x80409) -/-/INC
              flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/-
              DID: 0001-0034-00000A780001-0034-00000A79, short-term DID: 
              txn branch: (nil)
              edition#: 0              user#/name: 0/SYS
              oct: 3, prv: 0, sql: 0x27ff93570, psql: 0x22e2381c8
              stats: 0x26fd158c8, PX stats: 0x11089e04
    service name: SYS$BACKGROUND
    Current Wait Stack:
      Not in wait; last wait ended 0.245902 sec ago 
    Wait State:
      fixed_waits=0 flags=0x21 boundary=(nil)/-1
    Session Wait History:
        elapsed time of 0.245932 sec since last wait
     0: waited for 'PGA memory operation'
        =0x10000, =0x1, =0x0
        wait_id=40 seq_num=41 snap_id=1
        wait times: snap=0.000016 sec, exc=0.000016 sec, total=0.000016 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.004171 sec of elapsed time
     1: waited for 'ges resource directory to be unfrozen'
        =0x0, =0x0, =0x0
        wait_id=39 seq_num=40 snap_id=1
        wait times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.000159 sec of elapsed time
     2: waited for 'PGA memory operation'
        =0x10000, =0x1, =0x0
        wait_id=38 seq_num=39 snap_id=1
        wait times: snap=0.000008 sec, exc=0.000008 sec, total=0.000008 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.001252 sec of elapsed time
...
    ----------------------------------------
## search 2169daef0
                ----------------------------------------
                SO: 0x25a8dd8f0, type: 3, owner: 0x200c75980, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
                 proc=0x3327efe60, name=call, file=ksu.h LINE:15733, pg=0, conuid=0
                (call) sess: cur 353bc5da8, rec 353bc5da8, usr 343c899d0; flg:40 fl2:1; depth:6
                svpt(xcb:(nil) sptn:0x1d uba: 0x00000000.0000.00 uba: 0x00000000.0000.00)
                  ----------------------------------------
                  SO: 0x2169daef0, type: 3, owner: 0x25a8dd8f0, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
                   proc=0x3327efe60, name=call, file=ksu.h LINE:15733, pg=0, conuid=0
                  (call) sess: cur 353bc5da8, rec 353bc5da8, usr 343c899d0; flg:0 fl2:1; depth:7
                  svpt(xcb:(nil) sptn:0x1f uba: 0x00000000.0000.00 uba: 0x00000000.0000.00)
                  ----------------------------------------
                  SO: 0x21f8b2c98, type: 102, owner: 0x25a8dd8f0, flag: INIT/-/-/-/0x00 if: 0x1 c: 0x1
                   proc=0x3327efe60, name=row cache enqueues, file=kqr.h LINE:2319, pg=0, conuid=0
                  row cache enqueue: count=1 session=0x343c899d0 object=0x21f9e9d28, mode=S
                  savepoint=0x1d
                  type=MULTI-INSTANCE instance locked=T handle=0x367b752c0
                  row cache parent object: addr=0x21f9e9d28 cid=8(dc_objects) conid=0 conuid=0
                  hash=6e50015a typ=61 transaction=(nil) flags=00008000 inc=1, pdbinc=1
                  objectno=0 ownerid=0 nsp=1
                  name=OBJ$
                  own=0x21f9e9df8[0x21f8b2d58,0x21f8b2d58] wat=0x21f9e9e08[0x21f9e9e08,0x21f9e9e08] mode=S req=N
                  status=-/-/-/-/-/-/-/-/LOADING
                  instance lock=QI 6e50015a 1f898cb3
                  set=0, complete=FALSE

TIP:
KTSJ ==> Kernel Transaction Space Job,   应该是空间预分配相关的进程,在12.1.0.2时引入,有_enable_space_preallocation 控制。

接下来在MOS中查找,不难确认符合现在 2438982.1 中提到的已知BUG。

12.2.0.1 Active DataGuard AKA ADG ORA-4024 self-deadlock detected while trying to mutex pin cursor ( Doc ID 2438982.1 )
Bug 27716177 – ADG: ORA-04021:ORA-04024:ROW CACHE ENQUEUE AGAINST DC_OBJECTS:OBJ$ <==closed as following duplicate unpublished bug
BUG 28423598 – GOLDENGATE AUTH CAUSES ACTIVE DG TO BE UNUSABLE UNTIL BOUNCE

貌似是因为OGG的认证问题,导致ADG hang 一致在等待dc_objects(OBJ$)的row cache enq。 临时解决也是重启后恢复。

目前已提供one-off Patch 28423598 ,下载对应平台的补丁安装,目前已合成到19.1 (19C)的主版本中。

Troubleshooting Out-Of-Memory(OOM) killer db crash when memory exhausted

$
0
0

# db alert log

Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Tue Apr 23 08:54:27 2019
WARNING: Heavy swapping observed on system in last 5 mins.
pct of memory swapped in [3.68%] pct of memory swapped out [13.12%].
Please make sure there is no memory pressure and the SGA and PGA 
are configured correctly. Look at DBRM trace file for more details.
Tue Apr 23 08:56:27 2019
Thread 1 cannot allocate new log, sequence 10395
Private strand flush not complete
  Current log# 2 seq# 10394 mem# 0: /hescms/oradata/anbob/redo02a.log
  Current log# 2 seq# 10394 mem# 1: /hescms/oradata/scms/redo02b.log
Thread 1 advanced to log sequence 10395 (LGWR switch)
  Current log# 3 seq# 10395 mem# 0: /hescms/oradata/scms/redo03a.log
  Current log# 3 seq# 10395 mem# 1: /hescms/oradata/scms/redo03b.log
Tue Apr 23 08:56:41 2019
Archived Log entry 10505 added for thread 1 sequence 10394 ID 0xaef43455 dest 1:
Tue Apr 23 09:08:37 2019
System state dump requested by (instance=1, osid=8886 (PMON)), summary=[abnormal instance termination].
Tue Apr 23 09:08:37 2019
PMON (ospid: 8886): terminating the instance due to error 471
System State dumped to trace file /ora/diag/rdbms/scms/scms/trace/scms_diag_8896_20190423090837.trc
Tue Apr 23 09:08:37 2019
opiodr aborting process unknown ospid (22614) as a result of ORA-1092
Tue Apr 23 09:08:38 2019
opiodr aborting process unknown ospid (27627) as a result of ORA-1092
Instance terminated by PMON, pid = 8886
Tue Apr 23 09:18:18 2019
Starting ORACLE instance (normal)

# OS log /var/log/messages

Apr 23 08:52:18 anbobdb kernel: NET: Unregistered protocol family 36
Apr 23 09:07:28 anbobdb kernel: oracle invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Apr 23 09:07:32 anbobdb rtkit-daemon[3097]: The canary thread is apparently starving. Taking action.
Apr 23 09:07:47 anbobdb kernel: oracle cpuset=/ mems_allowed=0-4
Apr 23 09:07:47 anbobdb kernel: Pid: 22753, comm: oracle Not tainted 2.6.32-431.el6.x86_64 #1
Apr 23 09:07:47 anbobdb rtkit-daemon[3097]: Demoting known real-time threads.
Apr 23 09:07:47 anbobdb rtkit-daemon[3097]: Demoted 0 threads.
Apr 23 09:07:47 anbobdb kernel: Call Trace:
Apr 23 09:07:47 anbobdb kernel: [] ? dump_header+0x90/0x1b0
Apr 23 09:07:47 anbobdb kernel: [] ? security_real_capable_noaudit+0x3c/0x70
Apr 23 09:07:47 anbobdb kernel: [] ? oom_kill_process+0x82/0x2a0
Apr 23 09:07:47 anbobdb kernel: [] ? select_bad_process+0xe1/0x120
Apr 23 09:07:47 anbobdb kernel: [] ? out_of_memory+0x220/0x3c0
Apr 23 09:07:47 anbobdb kernel: [] ? __alloc_pages_nodemask+0x8ac/0x8d0
Apr 23 09:07:47 anbobdb rtkit-daemon[3097]: The canary thread is apparently starving. Taking action.
Apr 23 09:07:47 anbobdb rtkit-daemon[3097]: Demoting known real-time threads.
Apr 23 09:07:47 anbobdb rtkit-daemon[3097]: Demoted 0 threads.
Apr 23 09:07:48 anbobdb kernel: [] ? alloc_pages_current+0xaa/0x110
Apr 23 09:07:52 anbobdb kernel: [] ? pte_alloc_one+0x1b/0x50
Apr 23 09:07:52 anbobdb kernel: [] ? __pte_alloc+0x32/0x160
Apr 23 09:07:52 anbobdb kernel: [] ? handle_mm_fault+0x1c0/0x300
Apr 23 09:07:52 anbobdb kernel: [] ? down_read_trylock+0x1a/0x30

Note:  OS messages indicating resource shortage, OOM killer etc (TFA will collect this)

What is OOM Killer?
The OOM killer, a feature enabled by default on Linux kernel, is a self protection mechanism employed the Linux kernel when under severe memory pressure.
If kernel can not find memory to allocate when it’s needed, it puts in-use user data pages on the swap-out queue, to be swapped out. If the Virtual Memory (VM) cannot allocate memory and canot swap out in-use memory, the Out-of-memory killer may begin killing current userspace processes. it will sacrifice one or more processes in order to free up memory for the system when all else fails.

The behavior of OOM killer in principle is as follows:
Lose the minimum amount of work done
Recover as much as memory it can
Do not kill anything actually not using a lot memory alone
Kill the minimum amount of processes (one)
Try to kill the process the user expects to kill

Reason Probable Cause:

1 Spike in memory usage based on a load event (additional processes are needed for increased load).
2 Spike in memory usage based on additional services being added or migrated to the system. (Added another app or started a new service on the system)
3 Spike in memory usage due to failed hardware such as a DIMM memory module.
4 Spike in memory usage due to undersizing of hardware resources for the running application(s).
5 There’s a memory leak in a running application.

If the application uses mlock() or HugeTLB pages (HugePages), it may not be able to use its swap space for that application (because locked pages or HugePages are not swappable). If this happens, SwapFree may still have a very large value when the OOM occurs. However overusing them may exhaust system memory and leave the system with no other recourse.

Troubleshooting
Check to see how often the Out of memory (OOM) killer process is running.
$ egrep ‘Out of memory:’ /var/log/messages

Check to see how large the memory consumption is of the processes being killed.
$ egrep ‘total-vm’ /var/log/messages

Further analysis, we can check the system activity reporter (SAR) data to see what it’s captured about the OS.

Check swap statistics with the -S flag: A high % of swpusedindicates swapping and memory shortage
$ sar -S -f /var/log/sa/sa2

Check CPU and IOwait statistics: High %user or %systemindicate a busy system, also high %iowait the system is spending important time waiting on underlying storage
$ sar -f /var/log/sa/sa31

Check memory statistics: High %memused and %commit values tell us the system is using nearly all of its memory, and memory that is committed to processes (high %commit is more concerning)
$ sar -r -f /var/log/sa/sa

Lastly, check the amount of memory on the system, and how much is free/available:

$ free -m or cat /proc/meminfo or dmidecode -t memory

In the oracle environment, first check whether the SGA and PGA configuration is reasonable. In this case, we later reduced the size of these memory areas, reserved more available memory for the operating system, and configured hugepage. The benefits of hugepage are not described in multiple descriptions, BTW, If you increase the hugepages, check if the check has reached the upper limit of kernel.shmall. and check application process memory leak, even PGA leak.   config hugepage linux

 

References  Linux: Out-of-Memory (OOM)Killer (文档 ID 452000.1) and RHEL online docs.


Troubleshooting sqlplus logon instance slow and Swap usage high even memory is 50% free

$
0
0

Some time we will face  login to the local oracle database instance using sqlplus “/ as sysdba” may take 1 minute or even longer,  This is usually not normal and needs to be diagnosed for the reason.

Troubleshooting
Check for CPU and memory usage at the OS level
Check alert log, sqlnet.log, listener.log and OS logs
Maybe due to DISM on Solaris
use tool  like  truss or trace to debug  e.g.

trace -fo /tmp/test_conn.log sqlplus / as sysdba
truss -aefdD -o /tmp/test_conn.log   sqlplus '/as sysdba'

A few days ago, I encountered a case, a 11.2.0.4 three-node Oracle RAC database on RHEL 6.6 , when trying to login to the database instance using sqlplus “/ as sysdba” on the third node, It’s very slow, and vmstat show that there is a very large  swap in and out, but there is still a lot of memory free space . Just like the picture below

vmstat can be used to live check active swapping. Values in the si and so columns indicate memory being swapped in and out.

swap memory is considerably less performant than actual RAM. it is not recommended to use swap as a substitute for memory. Adding swap serves to give yourself time to further troubleshoot the issue, and either free up or add additional memory to the system.

Why use Swap when memory is 50% free?

First of all, this database instance is not configured with hugepage,  The host has 200GB of physical memory, and the SGA and PGA of the database configuration are allocated about 110G in total. it is  free 100GB currutly. si and so columns tald us it had swap used.

Swap space is the area on a hard disk which is part of the Virtual Memory of your machine, which is a combination of accessible physical memory (RAM) and the swap space. Swap space temporarily holds memory pages that are inactive. Swap space is used when your system decides that it needs physical memory for active processes and there is insufficient unused physical memory available. If the system happens to need more memory resources or space, inactive pages in physical memory are then moved to the swap space therefore freeing up that physical memory for other uses.

The Linux kernel automatically moves RAM reserved by programs–but not really used–into swap, so that it can serve the better purpose of extending cached memory,   by default, Linux may choose to swap out a process or some of its data due to low usage even if it is not running low on RAM.

The Linux kernel will migrate some data from memory which has not been accessed for some time out to swap to make room for a possible future demand. This standard paging from memory to swap is performed to optimize the overall system performance.

The swappiness parameter controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Controlling the paging activity of the kernel using the /proc/sys/vm/swappiness setting.

 

What is swappiness and how do I change it?

The swappiness parameter controls the tendency of the kernel to move processes out of physical memory and onto the swap disk. Because disks are much slower than RAM, this can lead to slower response times for system and applications if processes are too aggressively moved out of memory.

  • swappiness can have a value of between 0 and 100
  • swappiness=0 tells the kernel to avoid swapping processes out of physical memory for as long as possible
  • swappiness=100 tells the kernel to aggressively swap processes out of physical memory and move them to swap cache

The default setting which is 60 on RHEL 5 & 6 and 30 on RHEL 7. Reducing the default value of swappiness will probably improve overall performance for a typical Ubuntu desktop installation. A value of swappiness=10 is recommended, but feel free to experiment.

Increasing this value will make the system more inclined to swap inactive memory pages to disk, rather than dropping pages from the page cache, leaving more memory free for cached I/O. This may be preferred for heavy I/O workloads. Decreasing this value will make the system less inclined to swap, and more inclined to drop pages from the page cache.

Note: while vm.swappiness accepts 0-100 values, it should not be set to 0, as this can cause an unexpected behaviour on the system. 0 is a special value which works differently depending on the RHEL version in use:

  • Prior to RHEL 6.4: vm.swappiness = 0 disables swapping for the most part, except to avoid an out-of-memory situation.
  • RHEL 6.4 and later: vm.swappines = 0 disables swapping in most cases, If we setswappiness==0, the kernel does not swap out completely (for global reclaim until the amount of free pages and filebacked pages in a zone has been reduced to something very very small (nr_free + nr_filebacked < high watermark)).

You can use the following script to get the swap manually back into RAM:

err="not enough RAM to write swap back, nothing done"
mem=`free|grep Mem:|awk '{print $4}'`
swap=`free|grep Swap:|awk '{print $3}'`
test $mem -lt $swap && echo -e $err && exit 1
swapoff -a && swapon -a &&
exit 0

set the vm.swappiness parameter. e.g.

Online dynamic modification takes effect immediately
echo '10'> /proc/sys/vm/swappiness
To make this setting persistent across reboots  append /etc/sysctl.conf
vm.swappiness = 10

In the ORAClE environment, the swappness value is the big or the small?

Oracle increase VS  Red Hat decrese

Oracle recommended to increase ,  The reason Oracle provides is: In stress testing, we found that setting swapiness to 100 can reduce or delay node eviction caused by memory exhaustion caused by logon storms.

RedHAT recommented to decrease, Tuning Red Hat Enterprise Linux for Oracle and Oracle RAC This document mentions Swapping for Oracle is bad and recommends setting the vm.swappiness parameter to 0.

the two opinions are the opposite, so who should I listen to?

RHEL: Mainly from the perspective of performance. Very simple, in the case of enough memory, you should avoid using swap, because in terms of performance, continuous page breaks will only lead to poor performance.

Oracle: Mainly from the perspective of usability. There is a typical scenario, two-node RAC, one of which is down for some reason, this time the connection will automatically transfer to another node, another node can not handle the instantaneous large number of connections due to insufficient memory and hang.

In environments with more physical memory free  space, I recommend decrease this value. After all, using SWAP can cause performance problems.

 

Reference  Why is SWAP being used instead of available physical memory? (Doc ID 2404462.1) and RHEL online docs.

Wait Event: buffer deadlock

$
0
0

When I was analyzing a database performance case what many session hang in 11.2.0.4 RAC Aactive dataguard, I saw a large number of foreground sessions waiting for this event (buffer deadlock) and buffer busy wait. what is the buffer deadlock? just Here is a short record.

Oracle Documents

buffer deadlock

Oracle does not really wait on this event; the foreground only yields the CPU. Thus, the chances of catching this event are very low. This is not an application induced deadlock, but an assumed deadlock by the cache layer. The cache layer cannot get a buffer in a certain mode within a certain amount of time.

Wait Time: 0 seconds. The foreground process only yields the CPU and will usually be placed at the end of the CPU run queue.

Buffer deadlocks in the care of P1 buffer provides the address of data block requested block.

SQL> @sed deadlock
Show wait event descriptions matching %deadlock%..

EVENT# EVENT_NAME                                              WAIT_CLASS           PARAMETER1                PARAMETER2                PARAMETER3                ENQUEUE_NAME                   REQ_REASON                       REQ_DESCRIPTION
------ ------------------------------------------------------- -------------------- ------------------------- ------------------------- ------------------------- ------------------------------ -------------------------------- ----------------------------------------------------------------------------------------------------
  787 buffer deadlock                                         Other                dba                       class*10+mode             flag

SQL> @dec 3571369274

                                DEC                  HEX
----------------------------------- --------------------
                  3571369274.000000             D4DEC53A

SQL> @dba D4DEC53A

    RFILE#     BLOCK# BIGFILE_BLOCK# DUMP_CMD
---------- ---------- -------------- ---------------------------------------------------------------------------------------------------------------------
       851    2016570     3571369274 -- alter system dump datafile 851 block 2016570

Buffer Deadlock is very typical wait event and not so much seen in database. But when you are encountering with this mysterious wait then you would see the effect of wait and performance bottleneck of the server. Lack of resources at CPU level, buffer deadlock can be appearing. , Characteristics of Buffer deadlock is something same as buffer busy wait, This event does not appear too repeatedly v$system_event and v$session_wait points of view, and when it does, it happens very quickly and can be difficult to detect. However, since the ratio of buffer busy wait event has been explained here.

A form of enqueue locking is used to protect cached database blocks. For each buffer in the database buffer cache, there is a buffer header. The buffer headers constitute a fixed array in the permanent memory part of the shared pool. These buffer headers act as the resource structures for buffer locks. Sessions manipulate buffer headers, and thus buffers, via dynamically allocated structures known as buffer handles. The buffer handles act as the lock structures for buffer locks.

Buffer locks are taken only in shared and exclusive modes. The buffer headers implement a two-way linked list of the buffer handles for sessions that are using the buffer, and another for the buffer handles of sessions waiting for the buffer.

If a buffer lock deadlock is suspected, the session that timed out trying to acquire a buffer lock releases the buffer locks that it is holding on other buffers, and immediately enqueues them again, thereby falling to the end of the queue of waiting sessions. It also posts the first process that was waiting for a lock on each of the buffers concerned, and then yields the CPU. Although yielding the CPU does not really constitute a wait, a buffer deadlock wait is recorded and the exchange deadlocks statistic is incremented.

Know issue

Bug 27138798  – HIGH BUFFER DEADLOCK WAITS AFTER APPLYING 12.1.0.2.170814 DBBP
Bug 21893830 ORA-600 [kcbgtcr_13] / ‘gc cr failure’ , ‘cr request retry’ and ‘buffer deadlock’ on ADG – Superseded
Bug 14195003 Deadlock with “gc current request” and “gc buffer busy” is possible on RAC and wait forever
Bug 17695685 Hang in Active Dataguard Database with RAC

Reference <Oracle Internals: An Introduction>

Scripts: Tablespace Report for Oracle 12c Multitenant Database

$
0
0

You can use this SQL script to report tablespace space details in 12c Multitenant database.

--
-- file: tablespace_rpt12.sql
-- purpose: To report tablespaces for 12c+ Multitenant database
-- author: weejar zhang (www.anbob.com)
--
SET LINES 300 PAGES 100
COL con_name        FORM A15 HEAD "Container|Name"
COL files           FORM 999,999 HEAD "Num Files"
COL tablespace_name FORM A30
COL fsm             FORM 999,999,999,999 HEAD "Free|Space Meg."
COL apm             FORM 999,999,999,999 HEAD "Alloc|Space Meg."
--
COMPUTE SUM OF fsm apm files ON con_id REPORT
BREAK ON REPORT ON con_id ON con_name ON tablespace_name
--
WITH x AS (SELECT c1.con_id, cf1.tablespace_name, SUM(cf1.bytes)/1024/1024 fsm
           FROM cdb_free_space cf1
               ,v$containers c1
           WHERE cf1.con_id = c1.con_id
           GROUP BY c1.con_id, cf1.tablespace_name),
     y AS (SELECT c2.con_id, cd.tablespace_name, count(*) files,SUM(cd.bytes)/1024/1024 apm
           FROM cdb_data_files cd
               ,v$containers c2
           WHERE cd.con_id = c2.con_id
           GROUP BY c2.con_id
                   ,cd.tablespace_name)
SELECT x.con_id, v.name  con_name, x.tablespace_name,files, x.fsm, y.apm, round(1-fsm/apm,2) pct
FROM x, y, v$containers v
WHERE x.con_id          = y.con_id
AND   x.tablespace_name = y.tablespace_name
AND   v.con_id          = y.con_id
UNION All
SELECT vc2.con_id, vc2.name , tf.tablespace_name,count(*) files, null, SUM(tf.bytes)/1024/1024, null
FROM v$containers vc2, cdb_temp_files tf
WHERE vc2.con_id = tf.con_id
GROUP BY vc2.con_id, vc2.name , tf.tablespace_name
ORDER BY 1, 2; 

Troubleshooting ORA-600 [kcrfw_search_blklctn: Dead loop] and more about NSA process

$
0
0

Format: ORA-600 [kcrfw_search_blklctn: Dead loop] [a] [b] [c] [d] [e], in Oracle Dataguard 11.2.0.3 RAC on AIX . This error will have no effect other than a small delay to the DataGuard redo translate.

Error output

ORA-00600: internal error code, arguments: [kcrfw_search_blklctn: Dead loop], [], [], [], [], [], [], [], [], [], [], []
Closing Redo Read Context 
NSA2: Exception 600 encountered.. shutting down
NSA2: Doing a channel reset for next time around...

Trace file

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/11.2.0.3/dbhome_1
System name:	AIX
Node name:	anbob2
Release:	1
Version:	7
Machine:	00F636EE4C00
Instance name: dbanbob2
Redo thread mounted by this instance: 2
Oracle process number: 59
Unix process pid: 3801622, image: oracle@anbob2 (NSA2)


*** 2019-06-03 10:37:54.240
*** SESSION ID:(5547.1) 2019-06-03 10:37:54.240
*** CLIENT ID:() 2019-06-03 10:37:54.240
*** SERVICE NAME:(SYS$BACKGROUND) 2019-06-03 10:37:54.240
*** MODULE NAME:() 2019-06-03 10:37:54.240
*** ACTION NAME:() 2019-06-03 10:37:54.240
 
Dump continued from file: /oracle/app/oracle/diag/rdbms/stddbanbob/dbanbob2/trace/dbanbob2_nsa2_3801622.trc
ORA-00600: internal error code, arguments: [kcrfw_search_blklctn: Dead loop], [], [], [], [], [], [], [], [], [], [], []

----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
kgeadse <-kgerinv_internal <-kgerinv <-kgeasnmierr <-
kcrfw_search_blklctn <-kcrfr_read_memory <-kcrfr_read <-
krsw_redo_push <-krsw_action_respond <-ksbabs <-
ksbrdp <-opirip <-opidrv <-sou2o <-opimai_real <-
ssthrdmain <-main

Root cause:
BUG 14582560  fixed in 12c(12.1.0.2, 12.2)

What is NSAn Process?

Starting with 11gR1, Oracle Data Guard asynchronous redo transport will read redo directly from the in-memory log buffer, provided that the requested redo blocks still reside in the log buffer, and have not been reused for subsequent redo generation. If the redo blocks are not available in the log buffer, Data Guard asynchronous redo transport will read the redo from the Online Redo Log (ORL). The main advantage of this change is to reduce the number of I/O calls issued to the ORL. in 10g and 11g R1 use LNS background process, start 11g R2 use NSA backgroup process.

Oracle externalized buffer hit ratios through the view x$logbuf_readhist .
BUFSIZE – Actual and estimated buffer sizes. CURRENT row is the configured log buffer row
RDMEMBLKS – Actual and estimated reads from the log buffer
RDDISKBLKS – Actual and estimated reads from the Online Redo Log files
HITRATE – Memory hit ratio for the corresponding buffer size. CURRENT (BUFINFO) is the line for present log buffer. Ratio is calculated by RDMEMBLKS/( RDMEMBLKS+ RDDISKBLKS). It is important that we keep the hit ratios close to 100% in a healthy performing data guard environment.

upgrading LOG_BUFFER to a higher value or make sure there is no network bottleneck between primary and standby.  This may improve over all dataguard syncing performance.

11g R2 DG async transport:

DG ASYNC

In a 10g oracle dataguard environment Log Network Server (LNS) process transports the redo from the primary to the standby site, until 11g R2 oracle introduce a new background process do that:
NSAn (Redo Transport NSA1 Process) is used on the primary database to ship redo data to the standby database when ASYNC mode is being used. There maybe multiple NSA processes such as NSA1 and NSA2.
NSSn (Redo Transport NSA1 Process) is also used on the primary database to ship redo data to the standby database. However, only when the SYNC mode is being used.

you will see new wait events for them:
‘SYNC Remote Write’ for all redo transport waits done by NSS processes
‘ASYNC Remote Write’ for all redo tranport waits done by NSA processes

Note that since 11gR2 the writing to online redo logs and to standby are done in parallel.

in 11.2.0.3 you maybe the below messages in the LGWR trace file :

NSS2 is not running anymore.

This message should appear only when log_archive_trace is nonzero.So when setting log_archive_trace = 0 those Messages should disappear. but due to  Bug 19177843 The above messages can be seen even when the log_archive_trace is set to 0, fixed 12.2.

FASTSYNC is a new LogXptMode for Data Guard in 12c. It enables Maximum Availability protection mode at larger distances with less performance impact than LogXptMode SYNC has had before.

OLD SYNC  (note: LNS 11g R2+ Replace by NSS)

sync

NEW FASTSYNC  (note: LNS 11g R2+ Replace by NSS)

 

References
FASTSYNC Redo Transport for Data Guard in #Oracle 12c
Franck Pachot’s Archive
MOS

Troubleshooting kernel: EXT4-fs warning (device dm-0): ext4_dx_add_entry: Directory index full!

$
0
0

The following error message is displayed in the database host operating system log of a customer today.

kernel: EXT4-fs warning (device dm-0): ext4_dx_add_entry: Directory index full!

The ‘directory index full’ error will be seen if there are lots of files/directories in the filesystem so that the tree reaches its indexing limits and cannot keep track further.

The directory index is an additional tree[H-tree] structure which helps directory lookups for ext3/ext4 filesystem, improving performance for huge directories. XFS is B-tree.

There is a limit in ext3 and ext4 of the directory structure.

  • A directory on ext3 can have at most 31998 sub directories, because an inode can have at most 32000 links. This is one cause of the warning.
  • A directory on ext4 can have at most 64000 sub directories.

The size of each section of a directory index is limited by the filesystem’s block size. If very long filenames are used, fewer entries will fit in the block, leading to ‘directory index full’ errors earlier than they would occur with shorter filenames. This can become a bigger problem when the filesystem’s block size is small (1024-byte or 2048-byte blocks), but will occur with 4096-byte blocks as well.

Resolution

  • Remove unnecessary or unwanted files or directories.
  • Reorganize files and directories within the filesystem to reduce the number of entries in each individual directory.
  • Use shorter filenames.
  • Change the block size of the filesystem. This option requires a re-format of the filesystem, since the block size cannot be changed once it is set.

Troubleshoot

As shown above is dm-0, what is dm-0? I will to demo many ways.

--RHEL 7 demo
# ls -l /dev/mapper
total 0
crw------- 1 root root 10, 236 Jun  5 09:30 control
lrwxrwxrwx 1 root root       7 Jun  5 09:30 ol-root -> ../dm-0
lrwxrwxrwx 1 root root       7 Jun  5 09:30 ol-swap -> ../dm-1

[root@localhost dev]# df
Filesystem          1K-blocks     Used Available Use% Mounted on
devtmpfs               872420        0    872420   0% /dev
tmpfs                  891852        0    891852   0% /dev/shm
tmpfs                  891852     8552    883300   1% /run
tmpfs                  891852        0    891852   0% /sys/fs/cgroup
/dev/mapper/ol-root  49250820 23477076  25773744  48% /
/dev/sda1             1038336   172976    865360  17% /boot
tmpfs                  178372        0    178372   0% /run/user/0

# dmsetup info /dev/dm-0
Name:              ol-root
State:             ACTIVE
Read Ahead:        8192
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      251, 0
Number of targets: 1
UUID: LVM-v7EXsgVrgQJVUMmoT2D4aCgTohMyqV7QAcNZ4PcmMqEzRhzK1xXQEBHHRISKybaq

# lvdisplay|awk '/LV Name/{n=$3} /Block device/{d=$3; sub(".*:","dm-",d); print d,n;}'
dm-1 swap
dm-0 root
or

[root@localhost mapper]# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0   50G  0 disk
├─sda1        8:1    0    1G  0 part /boot
└─sda2        8:2    0   49G  0 part
  ├─ol-root 251:0    0   47G  0 lvm  /
  └─ol-swap 251:1    0    2G  0 lvm  [SWAP]
sr0          11:0    1 1024M  0 rom

[root@localhost mapper]# findmnt /
TARGET SOURCE              FSTYPE OPTIONS
/      /dev/mapper/ol-root xfs    rw,relatime,attr2,inode64,noquota

Note:
dm-0 is “/” mounted point.

How find the directory?

# cd /
# for dir in `ls -1`; do echo $dir; find ./$dir -type f|wc -l; done

How to rm file and dirs quickly(clear empty) ? It is usually not feasible to delete billions of files directly using rm.

[root@localhost ~]# mkdir dirs

[root@localhost ~]# cd dirs
[root@localhost dirs]# ls
[root@localhost dirs]# touch {0001..1000}.file
[root@localhost dirs]# ls
0001.file 0073.file 0145.file 0217.file 0289.file 0361.file 0433.file 0505.file 0577.file 0649.file 0721.file 0793.file 0865.file 0937.file
...
...
[root@localhost dirs]# perl -e 'for(<*>){((stat)[9]<(unlink))}'
[root@localhost dirs]# ls
[root@localhost dirs]# 

— enjob it —

 

 

 

References:

https://access.redhat.com/solutions/29894

Viewing all 705 articles
Browse latest View live