Quantcast
Channel: ANBOB
Viewing all 701 articles
Browse latest View live

Oracle wait event “ges enter server mode”

$
0
0

10g onwards, Instance recovery is done in two phases. First phase scans the blocks to be recoverd and applied from rdo log files and the second phase actually does that. In a RAC instance the during the instance recovery, first pass scan can be delayed by 300ms-1.5s waiting on GRD (Global resource directory). During that time wait event which is called “ges enter server mode” .

Usually occurs when the RAC load is high, or in The DATAGUARD environment, Redo replay slow or hang due to some reasons,Sometimes accompanied by a ‘library cache lock’ or’ cursor: pin s on x’, the solution analyzes the cause of the high load and the cause of the recover hang, if it’s DG try to cancel the media recovery and restart. It is recommended to collect the systemstate dump to diag.


oracle wait event “enq: SQ – contention” and DBA_DB_LINK_SOURCES

$
0
0

从12c 版本开始新引入DBA_DB_LINK_SOURCES(link_sources$)记录了远程dblink 曾登录本地数据的会话信息(hostname、IP, dbname、用户名、logon_time、logon_count),在使用DBLINK的环境中有时会看到,dblink session在等待“Enq: SQ – contention  ”, 这个wait event并不陌生与sequence相关,看看是否sequence调用频反cache过小,或当时的row cache出现了问题。

案例

USERNAME           SID EVENT                MACHINE    MODULE               STATUS   LAST_CALL_ET SQL_ID          WAI_SECINW ROW_WAIT_OBJ# SQLTEXT                        BS          CH# OSUSER     HEX
----------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ---------
ANBOB             7902 enq: SQ - contention anbobd1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           147 grid         17562c4
ANBOB             8104 enq: SQ - contention anbobc1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           147 grid         17562be
ANBOB             8758 enq: SQ - contention anbobb1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           147 grid         17562c3
ANBOB               90 enq: SQ - contention anboba1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           148 grid         17562c0
ANBOB            11764 enq: SQ - contention anbobd1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           148 grid         17562bc
ANBOB             7636 enq: SQ - contention anboba1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           148 grid         17562bd
ANBOB              557 enq: SQ - contention anboba1     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           149 grid         17562c1
ANBOB             2638 enq: SQ - contention anboba2     oracle               ACTIVE              4 d2217udafsm66   0:3                   -1 insert into link_sources$( sou :           148 grid         17562bf

能看出是一些DBLINK会话,当时在执行的SQL

SQLid:d2217udafsm66
insert into link_sources$( source_id, username, user#,
first_logon_time, last_logon_time, logon_count, db_name, dbid,
host_name,  ip_address, protocol, db_unique_name)  VALUES
(link_source_id_seq.nextval, :usrnm , :usri, SYSTIMESTAMP AT TIME ZONE
'UTC' , SYSTIMESTAMP AT TIME ZONE 'UTC', 1, :dbldbn, :dbldbi,
SYS_CONTEXT('USERENV', 'HOST'), SYS_CONTEXT('USERENV', 'IP_ADDRESS'),
SYS_CONTEXT('USERENV', 'NETWORK_PROTOCOL'),
SUBSTR(SYS_CONTEXT('USERENV', 'DBLINK_INFO'),20,
INSTR(SYS_CONTEXT('USERENV', 'DBLINK_INFO'),',',1,1)-20))

SQL> @seq LINK_SOURCE_ID_SEQ

SEQUENCE_OWNER              SEQUENCE_NAME                MIN_VALUE  MAX_VALUE INCREMENT_BY C O CACHE_SIZE LAST_NUMBER S E S K
--------------------------- --------------------------- ---------- ---------- ------------ - - ---------- ----------- - - - -
SYS                         LINK_SOURCE_ID_SEQ                   1 1.0000E+28            1 N N         10     9415641 N N N N

Note:
当时在执行的在记录link_sources$系统表,是DBA_DB_LINK_SOURCES的基表,记录dblink访问记录,使用的sequence 是 LINK_SOURCE_ID_SEQ, 当前的cache值是10, 以经验来看,以后ORACLE数据库除了审计AUDIT特性调用的sys.AUDSES$ 系统序列以外,以后又多了个序列建议提前增加cache那就是 LINK_SOURCE_ID_SEQ .

通常系统中调用DBLINK这么频繁,就需要考虑应用设计是否合理?SQL执行效率中是否浪费在dblink network上, 还有是否当时library cache出现性能问题,如伴随library cache lock等待。

DBA_DB_LINK_SOURCES Bugs
DBA_DB_LINK_SOURCES的引入在一些低版本(11g)访问时12c或更新的版本时可能出现ORA-603 ORA-3106。

 

How to disable DBA_DB_LINK_SOURCES ?
set   “_db_link_sources_tracking”=FALSE

Oracle 19c RAC 频繁重启 OS log show “avahi-daemon : Withdrawing address record”

$
0
0

总会有一些创新型的客户走在技术的最前端,但有些问题无参考这是最担忧的问题,最近就一个非常新的环境ORACLE 19C 2-nodes RAC on  IBM LinuxONE大机,同一大机部分节点上oracle实例频繁重启,重启前OS日志中有输出“avahi-daemon[4537]: Withdrawing address record for 28.83.70.4 on bond0.3112”, 这里简单记录这个案例。

环境信息(来源DB ALERT LOG)

NOTE: remote asm mode is remote (mode 0x2; from cluster type)
NOTE: Cluster configuration type = CLUSTER [4]
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.4.0.0.0.
ORACLE_HOME:    /u01/app/oracle/product/19.0.0/dbhome_1
System name:	Linux
Node name:	anboblpa4
Release:	3.10.0-693.el7.s390x
Version:	#1 SMP Thu Jul 6 20:01:53 EDT 2017
Machine:	s390x
...
Dynamic remastering is disabled
List of instances (total 2) :
 1 2
My inst 2 (I'm a new instance) 
...
Ping without log force is disabled:
  not an Exadata system.
Buffer Cache Full DB Caching mode changing from FULL CACHING DISABLED to FULL CACHING ENABLED 
Picked broadcast on commit scheme to generate SCNs
Endian type of dictionary set to big
2019-10-11T14:07:47.939549+08:00
Redo log for group 3, sequence 1 is not located on DAX storage
...
===========================================================
Dumping current patch information
===========================================================
Patch Id: 29585399
Patch Description: OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
Patch Apply Time: 2019-06-02T04:42:42+08:00

...
Patch Id: 29834717
Patch Description: Database Release Update : 19.4.0.0.190716 (29834717)
Patch Apply Time: 2019-10-09T13:50:10+08:00
...
Patch Id: 29774421
Patch Description: OJVM RELEASE UPDATE: 19.4.0.0.190716 (29774421)
Patch Apply Time: 2019-10-09T13:51:42+08:00

数据库错误日志 db alert log

2020-01-04T10:04:31.601196+08:00
Thread 2 advanced to log sequence 73905 (LGWR switch)
  Current log# 24 seq# 73905 mem# 0: +DATADG/A2CDB/ONLINELOG/group_24.405.1028475833
  Current log# 24 seq# 73905 mem# 1: +DATADG/A2CDB/ONLINELOG/group_24.392.1028475843
2020-01-04T10:04:31.747748+08:00
ARC1 (PID:9942): Archived Log entry 38639 added for T-2.S-73904 ID 0xffffffff9876a713 LAD:1
2020-01-04T10:17:28.893387+08:00
Thread 2 advanced to log sequence 73906 (LGWR switch)
  Current log# 25 seq# 73906 mem# 0: +DATADG/A2CDB/ONLINELOG/group_25.404.1028475833
  Current log# 25 seq# 73906 mem# 1: +DATADG/A2CDB/ONLINELOG/group_25.388.1028475843
2020-01-04T10:17:29.016138+08:00
ARC2 (PID:9944): Archived Log entry 38643 added for T-2.S-73905 ID 0xffffffff9876a713 LAD:1
2020-01-04T10:29:23.921516+08:00
PMON (ospid: ): terminating the instance due to ORA error 
2020-01-04T10:29:23.921625+08:00
Cause - 'Instance is being terminated due to fatal process death (pid: 5, ospid: 8726, IPC0)'
2020-01-04T10:29:25.777379+08:00
Instance terminated by PMON, pid = 8718
2020-01-04T10:32:01.383411+08:00
Starting ORACLE instance (normal) (OS id: 8531)
....
2020-01-04T10:32:30.496195+08:00
CJQ0 started with pid=96, OS id=10437 
Completed: ALTER DATABASE OPEN /* db agent *//* {2:17390:2} */

CRSD.LOG

## crsd.log
2020-01-04 10:27:31.168 :GIPCHTHR:2329880848:  gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30041 loopCount 60 sendCount 12 recvCount 36 postCount 12 sendCmplCount 12 recvCmplCount 12
2020-01-04 10:27:32.269 :GIPCHTHR:2327783696:  gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30041loopCount 47
2020-01-04 10:28:01.229 :GIPCHTHR:2329880848:  gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30061 loopCount 62 sendCount 12 recvCount 36 postCount 12 sendCmplCount 12 recvCmplCount 12
2020-01-04 10:28:02.331 :GIPCHTHR:2327783696:  gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30062loopCount 46
2020-01-04 10:28:31.273 :GIPCHTHR:2329880848:  gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30044 loopCount 67 sendCount 14 recvCount 42 postCount 14 sendCmplCount 14 recvCmplCount 14
2020-01-04 10:28:32.374 :GIPCHTHR:2327783696:  gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30043loopCount 46
2020-01-04 10:29:01.315 :GIPCHTHR:2329880848:  gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30042 loopCount 60 sendCount 12 recvCount 36 postCount 12 sendCmplCount 12 recvCmplCount 12
2020-01-04 10:29:02.415 :GIPCHTHR:2327783696:  gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30041loopCount 46
2020-01-04 10:29:19.978 :UiServer:749717776: [     INFO] {2:17417:6560} Container [ Name: FENCESERVER
	API_HDR_VER: 
	TextMessage[3]
	CLIENT: 
	TextMessage[]
	CLIENT_NAME: 
	TextMessage[ocssd.bin]
	CLIENT_PID: 
	TextMessage[6809]
	CLIENT_PRIMARY_GROUP: 
	TextMessage[asmadmin]
	LOCALE: 
	TextMessage[AMERICAN_AMERICA.AL32UTF8]
]
2020-01-04 10:29:19.978 :UiServer:749717776: [     INFO] {2:17417:6560} Sending message to AGFW. ctx= 0x3fec404e010, Client PID: 6809
2020-01-04 10:29:19.978 :  OCRAPI:749717776: procr_beg_asmshut: OCR ctx set to donotterminate state. Return [0].  <<<<<<<<<<<<<<<<<<<
2020-01-04 10:29:19.978 :UiServer:749717776: [     INFO] {2:17417:6560} Force-disconnecting [21]  existing PE clients...
2020-01-04 10:29:19.978 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :28
2020-01-04 10:29:19.978 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :43
2020-01-04 10:29:19.978 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :46
...
2020-01-04 10:29:19.979 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :272
2020-01-04 10:29:19.979 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :273
2020-01-04 10:29:19.979 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :275
2020-01-04 10:29:19.979 :UiServer:749717776: [     INFO] {2:17417:6560} Disconnecting client of command id :292
2020-01-04 10:29:19.979 :UiServer:749717776: [     INFO] {2:17417:6560} Sending message: 554957 to AGFW proxy server.
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server received the message: FENCE_CMD[Proxy] ID 20489:554957   
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.ASMNET1LSNR_ASM.lsnr 2 1] ID 4100:554958 to the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:19.979 :   CRSPE:768592144: [     INFO] {2:17417:6560} Skipping Fence of : ora.CRSDG.dg
2020-01-04 10:29:19.979 :   CRSPE:768592144: [     INFO] {2:17417:6560} Skipping Fence of : ora.DATADG.dg
2020-01-04 10:29:19.979 :   CRSPE:768592144: [     INFO] {2:17417:6560} Skipping Fence of : ora.FRADG.dg
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.LISTENER.lsnr anboblpa4 1] ID 4100:554959 to the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:19.979 :   CRSPE:768592144: [     INFO] {2:17417:6560} Skipping Fence of : ora.a2cdb.db
2020-01-04 10:29:19.979 :   CRSPE:768592144: [     INFO] {2:17417:6560} Skipping Fence of : ora.asm
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.asmnet1.asmnetwork 2 1] ID 4100:554960 to the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.net1.network anboblpa4 1] ID 4100:554961 to the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.ons anboblpa4 1] ID 4100:554962 to the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.anboblpa4.vip 1 1] ID 4100:554963 to the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} dumpPendingFences:Started Resource Fencing, Count = 6
2020-01-04 10:29:19.979 :    AGFW:768592144: [     INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.asmnet1.asmnetwork 2 1;ora.net1.network anboblpa4 1;ora.ons anboblpa4 1;ora.anboblpa4.vip 1 1;---
2020-01-04 10:29:19.981 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.asmnet1.asmnetwork 2 1] ID 4100:554960 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.981 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.asmnet1.asmnetwork 2 1] ID 4100:554960 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.981 :    AGFW:768592144: [     INFO] {2:17417:6560} Fenced off the resource [ora.asmnet1.asmnetwork]  
2020-01-04 10:29:19.981 :    AGFW:768592144: [     INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 5
2020-01-04 10:29:19.981 :    AGFW:768592144: [     INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.net1.network anboblpa4 1;ora.ons anboblpa4 1;ora.anboblpa4.vip 1 1;---
2020-01-04 10:29:19.982 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.net1.network anboblpa4 1] ID 4100:554961 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.982 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.net1.network anboblpa4 1] ID 4100:554961 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:19.982 :    AGFW:768592144: [     INFO] {2:17417:6560} Fenced off the resource [ora.net1.network]  
2020-01-04 10:29:19.982 :    AGFW:768592144: [     INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 4
2020-01-04 10:29:19.982 :    AGFW:768592144: [     INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.ons anboblpa4 1;ora.anboblpa4.vip 1 1;---
2020-01-04 10:29:19.984 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.ASMNET1LSNR_ASM.lsnr 2 1] ID 4100:554958 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:19.984 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.LISTENER.lsnr anboblpa4 1] ID 4100:554959 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:20.089 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.anboblpa4.vip 1 1] ID 4100:554963 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:20.089 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.anboblpa4.vip 1 1] ID 4100:554963 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root
2020-01-04 10:29:20.089 :    AGFW:768592144: [     INFO] {2:17417:6560} Fenced off the resource [ora.anboblpa4.vip] <<<<<<<<<<<<<<<<<<<
2020-01-04 10:29:20.089 :    AGFW:768592144: [     INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 3
2020-01-04 10:29:20.089 :    AGFW:768592144: [     INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.ons anboblpa4 1;---
...
2020-01-04 10:29:21.640 :  OCRSRV:805304592: proas_amiwriter: ctx is in some other state
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.LISTENER.lsnr anboblpa4 1] ID 4100:554959 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} Fenced off the resource [ora.LISTENER.lsnr]
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 1
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;---
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.ASMNET1LSNR_ASM.lsnr 2 1] ID 4100:554958 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} Fenced off the resource [ora.ASMNET1LSNR_ASM.lsnr]
2020-01-04 10:29:21.687 :    AGFW:768592144: [     INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 0
2020-01-04 10:29:21.688 :   CRSPE:768592144: [     INFO] {2:17417:6560} Fence command status: UI_DATA
2020-01-04 10:29:21.688 :    AGFW:768592144: [     INFO] {2:17417:6560} Fence command completed, rc = 0
2020-01-04 10:29:21.688 :UiServer:749717776: [     INFO] {2:17417:6560} UI server recvd reply from Agfw Proxy Server: 116530
2020-01-04 10:29:21.688 :UiServer:749717776: [     INFO] {2:17417:6560} Response: c1|7!UI_DATAk6|RESULTt1|0
2020-01-04 10:29:21.688 :UiServer:749717776: [     INFO] {2:17417:6560} Done for ctx=0x3fec404e010
2020-01-04 10:29:21.690 :  OCRSRV:805304592: proas_amiwriter: ctx is in some other state
Trace file /u01/app/grid/diag/crs/anboblpa4/crs/trace/crsd.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.4.0.0.0 Copyright 1996, 2019 Oracle. All rights reserved.
 default:2381237440: 1: clskec:has:CLSU:910 4 args[clsdAdr_CLSK_err][mod=clsdadr.c][loc=(:CLSD00302:)][msg=2020-01-04 10:29:21.861 (:CLSD00302:) Trace file size and number of segments fetched from environment variable: ORA_DAEMON_TRACE_FILE_OPTIONS filesize=26214400,numsegments=10    Detected in function clsdAdrGetEnvVar_TFO at line number 6819. ]
...
    CLSB:2652294336: [     INFO] Argument count (argc) for this daemon is 2
    CLSB:2652294336: [     INFO] Argument 0 is: /u01/app/19.0.0/grid/bin/crsd.bin
    CLSB:2652294336: [     INFO] Argument 1 is: reboot
2020-01-04 10:31:18.928 : CRSMAIN:2652294336: [     INFO]  First attempt: init CSS context succeeded.
2020-01-04 10:31:18.928 : CRSMAIN:2652294336: [     INFO]  Start mode: normal
2020-01-04 10:31:18.930 :  CLSDMT:2146957584: [     INFO] PID for the Process [7535], connkey CRSD
2020-01-04 10:31:19.305 : CRSMAIN:2652294336: [     INFO]  CRS Daemon Starting

ocssd.log

2020-01-04 10:29:15.128 :    CSSD:2514209040: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:15.128 :    CSSD:2526538000: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4
2020-01-04 10:29:15.129 :    CSSD:2511063312: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2
2020-01-04 10:29:16.683 :    CSSD:2520246544: [     INFO] clssgmpcGMCReqWorkerThread: processing msg (0x3ff74043d70) type 2, msg size 76, payload (0x3ff74043d9c) size 32, sequence 1505227, for clientID 2:49:2
2020-01-04 10:29:17.178 :    CSSD:2514209040: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:17.178 :    CSSD:2514209040: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:17.178 :    CSSD:2526538000: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4
2020-01-04 10:29:17.179 :    CSSD:2511063312: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2
2020-01-04 10:29:18.229 :    CSSD:2511063312: [     INFO] clssgmcpGroupDataResp: sending type 5, size 166, status 0 to clientID 2:24:0
2020-01-04 10:29:18.229 :    CSSD:2514209040: [     INFO] clssgmMemberPublicInfo: group DG_CRSDG member 0 not found
2020-01-04 10:29:18.229 :    CSSD:2517100816: [     INFO] clssgmpcGMCReqWorkerThread: processing msg (0x3ff7403e390) type 2, msg size 85, payload (0x3ff7403e3bc) size 41, sequence 1505230, for clientID 2:24:0
2020-01-04 10:29:18.552 :    CSSD:839383312: [     INFO] clssnmSendingThread: sending status msg to all nodes
2020-01-04 10:29:18.552 :    CSSD:839383312: [     INFO] clssnmSendingThread: sent 4 status msgs to all nodes
2020-01-04 10:29:19.238 :    CSSD:2514209040: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:19.238 :    CSSD:2514209040: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:19.238 :    CSSD:2526538000: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4
2020-01-04 10:29:19.239 :    CSSD:2511063312: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2
2020-01-04 10:29:19.908 :    CSSD:2514209040: [     INFO]   : Processing member data change type 1, size 600 for group GR+DB_A2CDB, memberID 119:2:1  
2020-01-04 10:29:19.908 :    CSSD:2514209040: [     INFO]   : Sending member data change to GMP for group GR+DB_A2CDB, memberID 119:2:1 
2020-01-04 10:29:19.908 :    CSSD:2526538000: [     INFO] clssgmpcMemberDataUpdt: grockName GR+DB_A2CDB memberID 119:2:1, datatype 1 datasize 600
2020-01-04 10:29:19.909 :    CSSD:2511063312: [     INFO] clssgmSendEventsToMbrs: Group GR+DB_A2CDB, member count 1, event master 0, event type 6, event incarn 21350, event member count 1, pids 8779-13034,  
2020-01-04 10:29:19.909 :    CSSD:2511063312: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 119:2:1 from clientID 2:87:2
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssgmTermMember: Terminating memberID 119:2:1 (0x3ff400783a0) in grock GR+DB_A2CDB <<<<<<<<<<<<<<<<<<<
2020-01-04 10:29:19.909 :    CSSD:2514209040: ASSERT clsssc.c 8607
2020-01-04 10:29:19.909 :    CSSD:2514209040: clssscRefFree: object(0x3ff4842b670) has 0 reference prior to decrement, object may have been deallocated! <<<<<<<<<<<<<<<<<<<
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: Entered 
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: skipping 0 defined 0 
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: expiring 0  evicted 0 evicting node 0 this node 1
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: expiring 0  evicted 0 evicting node 0 this node 2
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: skipping 3 defined 0 
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: skipping 4 defined 0 
... 
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: skipping 30 defined 0 
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmCheckForNetworkFailure: skipping 31 defined 0 
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssscExit: Call to clscal flush successful and clearing the CLSSSCCTX_INIT_CALOG flag so that no further CA logging happens
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] clssnmRemoveNodeInTerm: node 2, anboblpa4 terminated. Removing from its own member and connected bitmaps
2020-01-04 10:29:19.909 :    CSSD:2514209040: [    ERROR] ###################################
2020-01-04 10:29:19.909 :    CSSD:2514209040: [    ERROR] clssscExit: CSSD aborting from thread GMClientListener
2020-01-04 10:29:19.909 :    CSSD:2514209040: [    ERROR] ###################################
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] (:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2020-01-04 10:29:19.909 :    CSSD:2514209040: [     INFO] ####### Begin Diagnostic Dump #######
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO] ### Begin diagnostic data for the Core layer ###
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO] Initialization successfully completed OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO] Initialization of EXADATA fencing successfully completed OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO] #### End diagnostic data for the Core layer ####
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO] ### Begin diagnostic data for the GM Client layer ###
2020-01-04 10:29:19.910 :    CSSD:2514209040: Status for clientID 2:11:2, pid(6609-5118), GIPC endpt 0x261e, flags 0x0004, refcount 3, aborted at 0, fence is not progress   OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO]   memberID 5:2:2, group EVMDMAIN2 refcount 3, state 0x0000, granted 0, fence is not in progress  OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: Status for clientID 2:13:1, pid(6758-5541), GIPC endpt 0x3ebd, flags 0x0004, refcount 3, aborted at 0, fence is not progress   OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO]   memberID 11:2:0, group CRF- refcount 3, state 0x0000, granted 0, fence is not in progress  OK
...
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO]   memberID 26:2:2, group ocr_dblpa4-cluster refcount 3, state 0x0000, granted 0, fence is not in progress  OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: Status for clientID 2:29:4, pid(7488-7910), GIPC endpt 0x3a1b, flags 0x0006, refcount 3, aborted at 0, fence is not progress   OK
2020-01-04 10:29:19.910 :    CSSD:2514209040: [     INFO]   memberID 7:2:1, group crs_version refcount 3, state 0x0000, granted 0, fence is not in progress  OK
...
01-04 10:29:19.911 :    CSSD:2514209040: Status for clientID 2:579:2, pid(9208-13294), GIPC endpt 0x45fbd4, flags 0x0002, refcount 2, aborted at 0, fence is not progress   OK
2020-01-04 10:29:19.911 :    CSSD:2514209040: Status for clientID 2:84:1, pid(8726-13006), GIPC endpt 0xbc75, flags 0x3000, refcount 3, aborted at 0, fence is not progress   OK
2020-01-04 10:29:19.911 :    CSSD:2514209040: [     INFO]   memberID 104:2:1, group IPC0_GRP_dblpa4-cluster_a2cdb refcount 3, state 0x0000, granted 0, fence is not in progress  OK
2020-01-04 10:29:19.911 :    CSSD:2514209040: Status for clientID 2:85:1, pid(8742-13021), GIPC endpt 0xbe61, flags 0x2000, refcount 3, aborted at 0, fence is not progress   OK
----- Call Stack Trace -----
2020-01-04 10:29:19.913 :    CSSD:2514209040: calling              call     entry                argument values in hex      
2020-01-04 10:29:19.913 :    CSSD:2514209040: location             type     point                (? means dubious value)     
2020-01-04 10:29:19.913 :    CSSD:2514209040: -------------------- -------- -------------------- ----------------------------
2020-01-04 10:29:19.913 :    CSSD:2514209040: ssdgetcall: Failure to recover Stack Trace: starting frame address is (nil)
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscExit()+1860    call     kgdsdst()            
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscAssert()+210   call     clssscExit()         
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscRefFreeInt()+256  call     clssscAssert()                                                  
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssgmTermMember()+206   call     clssscRefFreeInt()                                                
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssgmcClientDestroy()+266  call     clssgmTermMember()   
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscHTDestroyObj()+334  call     clssgmcClientDestroy()                                         
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscHTRefDestroyObj()+50  call     clssscHTDestroyObj()                                          
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscRefFreeInt()+180   call     clssscHTRefDestroyObj()  
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssgmclienteventhndlr()+2556   call     clssscRefFreeInt()                   
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscSelect()+1568  call     clssgmclienteventhndlr()     
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssgmProcClientReqs()+2204  call     clssscSelect()                                     
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssgmclientlsnr()+  call     clssgmProcClientReqs()                  
2020-01-04 10:29:19.914 :    CSSD:2514209040: clssscthrdmain()+26  call     clssgmclientlsnr()                                                  
2020-01-04 10:29:19.914 :    CSSD:2514209040: start_thread()+234   call     clssscthrdmain()     
...
2020-01-04 10:29:20.054 :    CSSD:840956176: [     INFO] clssnmCheckForNetworkFailure: skipping 30 defined 0 
2020-01-04 10:29:20.054 :    CSSD:840956176: [     INFO] clssnmCheckForNetworkFailure: skipping 31 defined 0 
2020-01-04 10:29:20.054 :    CSSD:840956176: [    ERROR] clssscExit: CSSD aborting from thread clssnmPollingThread
2020-01-04 10:29:20.054 :    CSSD:840956176: [     INFO] clssscExit: abort already set 1
...
2020-01-04 10:29:21.057 :    CSSD:2520246544: [     INFO] clssgmRPC: RPC(#1) to node(8298) not sent due to impending local GM shutdown
2020-01-04 10:29:21.057 :    CSSD:2520246544: [     INFO] clssgmRPC: failed to send RPC#8298 to node(1), rpcret(10), master(1), DBInfo(1), masterRPC(1),unsentRPC(0), queuing RPC to unsent queue
2020-01-04 10:29:21.057 :    CSSD:2520246544: [     INFO] clssgmDelMemCmpl: rpc 0x3ff96c83f90, ret 10, clientID 2:34:10 memberID 7:2:4
2020-01-04 10:29:21.058 :    CSSD:2511063312: [     INFO] clssgmcpMbrDeleteResp: Status -16 deleting memberID 7:2:4 from clientID 2:34:10
2020-01-04 10:29:21.288 :    CSSD:2751404192: [     INFO]   : Processing member data change type 1, size 4 for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:21.288 :    CSSD:2751404192: [     INFO]   : Sending member data change to GMP for group HB+ASM, memberID 17:2:2
2020-01-04 10:29:21.288 :    CSSD:2526538000: [     INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4
2020-01-04 10:29:21.288 :    CSSD:2526538000: [     INFO] clssgmpeersend: Local node is terminating, send aborted for node dx2dblpa4, number 1
2020-01-04 10:29:21.288 :    CSSD:2526538000: [  WARNING] clssgmBroadcastMap: clssgmpeersend node(1) failed - 0
2020-01-04 10:29:21.288 :    CSSD:2511063312: [     INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2
2020-01-04 10:29:21.688 :    CSSD:2514209040: [     INFO] clssscExit: CRSD cleanup status 0
2020-01-04 10:29:21.690 :    CSSD:2514209040: [     INFO] clssscagProcAgReq: Sending response that CSSD is shutting down reason 0x0
2020-01-04 10:29:21.690 :    CSSD:2514209040: [     INFO] clssscExit: CRSD cleanup successfully completed
2020-01-04 10:29:21.691 :    CSSD:2514209040: [     INFO] clssgmKillAllClients: Adding pid 7488-7910 to the kill request
2020-01-04 10:29:21.691 :    CSSD:2514209040: [     INFO] clssgmKillAllClients: Adding pid 7812-8904 to the kill request

ohasd_cssdagent_root

 2020-01-04 10:29:33.535 [CSSDAGENT(6791)]CRS-1661: The CSS daemon is not responding. If this persists, a reboot will occur in 13960 milliseconds

OS messages LOG

Jan  4 10:29:17 anboblpa4 systemd-logind: New session 43767 of user grid.
Jan  4 10:29:17 anboblpa4 systemd: Started Session 43767 of user grid.
Jan  4 10:29:17 anboblpa4 systemd: Starting Session 43767 of user grid.
Jan  4 10:29:18 anboblpa4 systemd-logind: Removed session 43767.
Jan  4 10:29:19 anboblpa4 avahi-daemon[4537]: Withdrawing address record for 28.83.70.4 on bond0.3112.
Jan  4 10:27:14 anboblpa4 journal: Runtime journal is using 8.0M (max allowed 3.1G, trying to leave 4.0G free of 31.4G available → current limit 3.1G).
Jan  4 10:27:14 anboblpa4 kernel: Initializing cgroup subsys cpuset
Jan  4 10:27:14 anboblpa4 kernel: Initializing cgroup subsys cpu
Jan  4 10:27:14 anboblpa4 kernel: Initializing cgroup subsys cpuacct
Jan  4 10:27:14 anboblpa4 kernel: Linux version 3.10.0-693.el7.s390x (mockbuild@s390-014.build.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 20:01:53 EDT 2017
Jan  4 10:27:14 anboblpa4 kernel: setup: Linux is running natively in 64-bit mode

OSW

-- VMSTAT
zzz ***Sat Jan 4 10:29:09 CST 2020
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0 12239196  46764 40267704    0    0   432    93    2    3  2  1 97  0  0
 1  0      0 12227552  46764 40268336    0    0   136   292 10503 16484  3  3 94  0  0
 0  0      0 12229828  46764 40267980    0    0  8368   156 10164 17217  2  1 97  0  0
zzz ***Sat Jan 4 10:29:39 CST 2020
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0 53626732  44196 9418284    0    0   432    93    2    3  2  1 97  0  0
 0  0      0 53624900  44196 9418216    0    0    36   779 4824 8191  2  2 97  0  0
 0  0      0 53625828  44196 9418264    0    0    36   311 3262 5288  0  0 99  0  0

-- TOP
zzz ***Sat Jan 4 10:29:39 CST 2020
top - 10:29:41 up 16 days,  1:33,  1 user,  load average: 0.59, 0.80, 0.93    
Tasks: 277 total,   1 running, 276 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  0.9 sy,  0.0 ni, 97.9 id,  0.1 wa,  0.0 hi,  0.0 si,  0.1 st
KiB Mem : 65946540 total, 53624596 free,  2859464 used,  9462480 buff/cache
KiB Swap: 33550332 total, 33550332 free,        0 used. 61802316 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4578 node_ex+  20   0  117028  13180   5924 S   3.0  0.0  55:01.24 node_expor+
 6758 root      rt   0 1530652 189620 105952 S   2.0  0.3 309:12.92 osysmond.b+
 6791 root      rt   0 1131744 153004 107768 S   2.0  0.2  15:29.29 cssdagent
 7648 root      rt  -5 1114976 252424 104644 S   2.0  0.4  96:31.80 ologgerd
 6809 grid      rt   0 3047524 261744 112608 S   1.0  0.4  81:24.56 ocssd.bin

zzz ***Sat Jan 4 10:32:41 CST 2020 
top - 10:32:42 up 2 min,  1 user,  load average: 3.50, 1.15, 0.42       最近有重启
Tasks: 454 total,   1 running, 452 sleeping,   0 stopped,   1 zombie
%Cpu(s):  2.6 us,  2.5 sy,  0.0 ni, 92.1 id,  2.5 wa,  0.1 hi,  0.2 si,  0.0 st
KiB Mem : 65946540 total, 29817876 free,  4089312 used, 32039352 buff/cache
KiB Swap: 33550332 total, 33550332 free,        0 used. 31010936 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7633 root      20   0 1237724  60736  28932 S   2.0  0.1   0:00.91 orarootage+
10619 grid      20   0    7124   2372   1440 R   2.0  0.0   0:00.02 top
 5711 root      20   0 2900512  89540  28720 S   1.0  0.1   0:06.56 ohasd.bin
-- IFCONFIG 
zzz ***Sat Jan 4 10:29:09 CST 2020
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        inet6 fe80::8006:9bff:feb4:ea26  prefixlen 64  scopeid 0x20
        ether 82:06:9b:b4:ea:26  txqueuelen 1000  (Ethernet)
        RX packets 209436160  bytes 143740137716 (133.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 121960004  bytes 46934313850 (43.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0.3112: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 28.83.70.2  netmask 255.255.255.0  broadcast 28.83.70.255
        inet6 fe80::8006:9bff:feb4:ea26  prefixlen 64  scopeid 0x20
        ether 82:06:9b:b4:ea:26  txqueuelen 1000  (Ethernet)
        RX packets 122078240  bytes 117461299874 (109.3 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 87350042  bytes 36732906875 (34.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0.3112:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 28.83.70.4  netmask 255.255.255.0  broadcast 28.83.70.255 《《下一次采集IP 不存在了
        ether 82:06:9b:b4:ea:26  txqueuelen 1000  (Ethernet)


zzz ***Sat Jan 4 10:29:39 CST 2020
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        inet6 fe80::8006:9bff:feb4:ea26  prefixlen 64  scopeid 0x20
        ether 82:06:9b:b4:ea:26  txqueuelen 1000  (Ethernet)
        RX packets 209438026  bytes 143740812908 (133.8 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 121961571  bytes 46935014086 (43.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

bond0.3112: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 28.83.70.2  netmask 255.255.255.0  broadcast 28.83.70.255
        inet6 fe80::8006:9bff:feb4:ea26  prefixlen 64  scopeid 0x20
        ether 82:06:9b:b4:ea:26  txqueuelen 1000  (Ethernet)
        RX packets 122080043  bytes 117461945938 (109.3 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 87351401  bytes 36733594709 (34.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

建议收集日志
— note: 最好提供所有节点

— OS
/var/log/messages
OSW 当问题时间点日志,现在的系统越来越快时间线上的先后顺序带来困难, 建议intarval 3-5s 使用gzip

— GI
su – grid
cd $ORACLE_BASE/diag/crs/*/crs/trace
ls -l alert log
ls -l crsd.trc
ls -l cssd.trc
ls -l ohasd_cssdagent_root.trc

— DB
su – oracle
cd $ORACLE_BASE/diag/rdbms/*/$ORACLE_SID/trace

ls -l alert*.log
ls -l *dbw*.trc
ls -l *lmhb*.trc
crash相关进程的trace

Avahi(avahi守护程序)

Avahi(avahi守护程序)相关于免费实现Apple的“ ZeroConf”程序,并使用mDNS / DNS-SD protocol套件进行服务发现。  并在网络中扫描打印机和其他共享资源  。 它还可以提供DNS和DHCP服务。 这对于笔记本电脑非常有用,但在服务器上并不需要,也可能导致网络性能下降,并在重负载下变得不稳定。Avahi对ORACLE没有任何益处。建议在ORACLE环境禁用该服务,Oracle已报告avahi-daemon可能会干扰Oracle RAC的多播心跳信号,从而导致应用程序层接口假定它已在节点上断开连接并重新启动。

avahi deamon can also spin or accumulate high CPU time
Bug# 14027941 Possible rac eviction avahi-daemon: withdrawing address record
Bug# 14739888 – CSSD FROM NODE 2 CANNOT JOIN THE CLUSTER AFTER REBOOT
Bug# 12717666 : LNX64-11203 CVU TO CHECK IF AVAHI-DAEMON IS DISABLED AND PROVIDEFIX-UP SCRIPT

OS messages中记录的
anboblpa4 avahi-daemon[4537]: Withdrawing address record for 28.83.70.4 on bond0.3112
网上有记录上面的日志显示 Withdrawing address,只是avahi-daemon发现了是IP被回收,但不是它发起的。

禁用AVAHI
To stop the avahi-daemon, for OL5/OL6:

# service avahi-dnsconfd stop # ignore any errors
# service avahi-daemon stop
# chkconfig avahi-dnsconfd off
# chkconfig avahi-daemon off

for OL7:

# systemctl stop avahi-daemon.socket avahi-daemon.service
# systemctl disable avahi-daemon.socket avahi-daemon.service

Note: 同时注意这个案例还有个非常不好的地方,ORACLE明确指出网卡名中不要带”.”

library cache lock或row cache lock, Failed Logon Delay 因为错误的密码尝试

$
0
0

数据库为了防止频繁的错误密码登录或暴力破解,如果user profile中配置了无限次失败而不lock用户,或当修改了应用用户的数据库密码,有遗漏的应用程序配置未及时更新,就会因密码错误而导致性能问题,Oracle 11g引入了密码延迟验证的新特性, 想法虽好但也成了问题特性。 错误的密码尝试在不同的版本中,对数据库带来的性能问题等待事件可能不同, Oracle 10g R2, 11g R1 等待事件的是row cache lock, 11g R2等待事件library cache lock, 12C是的等待事件Failed Logon Delay。

现象
1, ASH  & AWR  show ” Connection Management” , top call “OAUTH”
2, DBA_HIST_ACTIVE_SESS_HISTORY&V$ACTIVE_SESSION_HISTORY show TOP_LEVEL_CALL_NAME=”OAUTH”
3, row cache lock waits for DC_USERS
4, Call Stack contains one of the following functions:

kziavua
kziaia
kziasfc

5, Checking the exclusive holder from DBA_DDL_LOCKS, a session may be seen holding a lock type (kglhdnsp) 79 on object (kglnaobj) 5:

SQL> select * from dba_ddl_locks where mode_held='Exclusive';
SESSION_ID OWNER NAME      TYPE MODE_HELD   MODE_REQU
---------- --------- ---------- ---------- ----------
612                5         79 Exclusive        None

找查登录失败的应用
1, LOGON TRIGGER
How to find out who caused the database user locked(ora-1017 or ORA-28000)(捕捉登录失败)

2, AUDIT trail

select username, os_username, userhost, client_id, trunc(timestamp), count(*) failed_logins 
from dba_audit_trail 
where returncode = 1017 and timestamp > sysdate -1 
group by username, os_username, userhost, client_id, trunc(timestamp);

收集信息

$sqlplus '/ as sysdba'
oradebug setmypid
oradebug unlimit
oradebug dump systemstate 266

从trace file中从等待PID的下的内容以waiting for‘为关键字查找,找到handle address,以具体的地址为关键字查找,owner为“calls cur”找到locked 对象和request Mode,继续以为handle address关键字继续查找,找到当前handle的其它持有会话。

或者从 x$kgllk表kgllkhdl是Handle , kgllkses是session addr.

解决办法
最根本的是找到正在频繁尝试错误密码的应用,修正密码。
在Oracle 11g 11.1.0.7版中, 没有办法禁用了等待,只能及时的更改应用密码。在11.2及以后的版本也可以尝试禁用这个密码延迟认证的特性
The event can be set as follows:

-- spfile
alter system set event ="28401 TRACE NAME CONTEXT FOREVER, LEVEL 1" scope=spfile;
-- or --
-- memory
alter system set events '28401 TRACE NAME CONTEXT FOREVER, LEVEL 1';

About logon delay

After 3 successive failures a sleep delay is introduced starting
at 1 second and extending to 10 seconds max. During each delay
the user X row cache lock is held in exclusive mode preventing
any concurrent logon attempt as user X (and preventing any
other operation which would need the row cache lock for user X).

案例

PROCESS 39:
  ----------------------------------------
  SO: 0x700000fd0aadce8, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x700000fd0aadce8, name=process, file=ksu.h LINE:12616 ID:, pg=0
  (process) Oracle pid:39, ser:238, calls cur/top: 0x700000f35836778/0x700000f35836778
            flags : (0x0) -
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 80
              last post received-location: kji.h LINE:3418 ID:kjata: wake up enqueue owner
              last process to post me: 700000fc8aa46e8 1 6
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x700000fc0b81778
    O/S info: user: grid, term: UNKNOWN, ospid: 26477280
    OSD pid info: Unix process pid: 26477280, image: oracle@ANBOB2
    Short stack dump:
ksedsts()+360<-ksdxfstk()+44<-ksdxcb()+3384<-sspuser()+116<-48bc<-sskgpwwait()+32<-skgpwwait()+180<-ksliwat()+11032<-kslwaitctx()+180<-kjusuc()+3652<-ksipgetctxi()+1892<-kqlmLock()+1296<-kqlmClusterLo
ck()+256<-kgllkal()+1984<-kglLock()+1276<-kglget()+264<-kziasfc()+1836<-kpolnb()+6840<-kpoauth()+672<-opiodr()+720<-ttcpip()+1028<-opitsk()+1508<-opiino()+940<-opiodr()+720<-opidrv()+1132<-sou2o()+136
<-opimai_real()+608<-ssthrdmain()+268<-main()+204<-__start()+112
    ----------------------------------------
    SO: 0x700000fd8e2a3e8, type: 4, owner: 0x700000fd0aadce8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x700000fd0aadce8, name=session, file=ksu.h LINE:12624 ID:, pg=0
    (session) sid: 1417 ser: 32923 trans: 0x700000fc6a3f890, creator: 0x700000fd0aadce8
              flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x9) -/-/INC
              DID: , short-term DID:
              txn branch: 0x0
              oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: anbob
    client details:
      O/S info: user: , term: , ospid: 1234
      machine: bao-176 program:
    Current Wait Stack:
     0: waiting for 'library cache lock'
        handle address=0x70000100d60d478, lock address=0x700001004b94110, 100*mode+namespace=0x4f0003
        wait_id=7 seq_num=8 snap_id=1
        wait times: snap=2 min 58 sec, exc=2 min 58 sec, total=2 min 58 sec
        wait times: max=infinite, heur=2 min 58 sec
        wait counts: calls=359 os=359
        in_wait=1 iflags=0x15a2
    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 2, sid: 441, ser: 33695
      Dumping final blocker:
        inst: 2, sid: 441, ser: 33695
    Wait State:
      fixed_waits=0 flags=0x22 boundary=0x0/-1

Ogg 12.3 PROCESS ABENDING with “OGG-01224 Address already in use”

$
0
0

Oracle 12.2 GoldenGate(ogg) 进程突然全部PROCESS ABENDING,MGR中配置了autostart,自动启动后恢复正常,但错误日志中出现了OGG-01224 Address already in use。

Oracle GoldenGate Command Interpreter for Oracle
Version 12.3.0.1.4 OGGCORE_12.3.0.1.0_PLATFORMS_180415.0359_FBO
Linux, x64, 64bit (optimized), Oracle 12c on Apr 16 2018 00:53:30
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2018, Oracle and/or its affiliates. All rights reserved.

— show ggserr.log

2020-01-19T14:11:42.769+0800  INFO    OGG-02756  Oracle GoldenGate Capture for Oracle, dep1.prm:  The definition for table GGADMIN.GG_HEARTBEAT is obtained from the trail file.
2020-01-19T14:11:42.839+0800  INFO    OGG-02756  Oracle GoldenGate Capture for Oracle, dep1.prm:  The definition for table GGADMIN.GG_HEARTBEAT_SEED is obtained from the trail file.
2020-01-19T14:11:57.456+0800  WARNING OGG-01223  Oracle GoldenGate Manager for Oracle, mgr.prm:  Connection reset by peer.
2020-01-19T14:12:06.683+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, d2_xxxxb.prm:  .
2020-01-19T14:12:13.435+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, ext2.prm:  .
2020-01-19T14:12:14.985+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, ext1.prm:  .
2020-01-19T14:12:22.695+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, d2_xxxx.prm:  .
2020-01-19T14:12:24.002+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, d2_xxxxd.prm:  .
2020-01-19T14:12:25.978+0800  INFO    OGG-01971  Oracle GoldenGate Capture for Oracle, ext1.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:25.978+0800  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, ext1.prm:  Address already in use.
2020-01-19T14:12:26.029+0800  INFO    OGG-01971  Oracle GoldenGate Capture for Oracle, d2_xxxxd.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:26.037+0800  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, d2_xxxxd.prm:  Address already in use.
2020-01-19T14:12:26.037+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, d2_xxxxd.prm:  PROCESS ABENDING.
2020-01-19T14:12:26.121+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, dep1.prm:  .
2020-01-19T14:12:26.260+0800  INFO    OGG-01971  Oracle GoldenGate Capture for Oracle, d2_xxxxa.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:26.411+0800  INFO    OGG-01971  Oracle GoldenGate Capture for Oracle, ext2.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:26.411+0800  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, ext2.prm:  Address already in use.
2020-01-19T14:12:26.721+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, d2_xxxxc.prm:  .
2020-01-19T14:12:26.723+0800  INFO    OGG-01971  Oracle GoldenGate Capture for Oracle, d2_xxxx.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:26.723+0800  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, d2_xxxx.prm:  Address already in use.
2020-01-19T14:12:26.723+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, d2_xxxx.prm:  PROCESS ABENDING.
2020-01-19T14:12:26.804+0800  WARNING OGG-01223  Oracle GoldenGate Capture for Oracle, d2_im.prm:  .
2020-01-19T14:12:26.830+0800  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, d2_im.prm:  Address already in use.
2020-01-19T14:12:26.830+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, d2_im.prm:  PROCESS ABENDING.
2020-01-19T14:12:26.850+0800  INFO    OGG-01971  Oracle GoldenGate Delivery for Oracle, r1.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:26.868+0800  INFO    OGG-01971  Oracle GoldenGate Capture for Oracle, d2_xxxxb.prm:  The previous message, 'WARNING OGG-01223', repeated 1 times.
2020-01-19T14:12:26.868+0800  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, d2_xxxxb.prm:  Address already in use.
2020-01-19T14:12:26.868+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, d2_xxxxb.prm:  PROCESS ABENDING.
2020-01-19T14:12:27.005+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, ext2.prm:  PROCESS ABENDING.
2020-01-19T14:12:27.056+0800  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, ext1.prm:  PROCESS ABENDING.
2020-01-19T14:12:57.969+0800  INFO    OGG-00975  Oracle GoldenGate Manager for Oracle, mgr.prm:  EXTRACT D2_SMS starting.

安全部门使用端口扫描软件会影响OGG进程。 当该进程在端口上收到来自非Oracle GoldenGate进程的通信时,OGG会解释为攻击,并将立即abend进程以防止对该产品的任何入侵。

可以通知安全部门避免OGG的端口扫描,但往往不可能,在OGG12.3版本以前不存在该问题,在12.2中是以另外一种通知方式,可以参考我之前的BLOG《 OGG-01022 Unknown N bytes message received & OGG-01223 Connection reset by peer》, Oracle 已确认为 OGG BUG.
Bug 28011195 – Port scans cause extract / replicat abend with OGG-1224 Address already in use.

需要下载安装对应的版本补丁解决问题,如当前我们的版本Version 12.3.0.1.4
Patch 28225926: Patch for BLR 28011195: Linux x84-64: Oracle 12c: OGG 12.3.0.1.4

新型冠状病毒来了,生命真的很脆弱,先进的IT技术有什么用

$
0
0

2019年“太难了”,没有难恐怕还有更难,2020年刚开始,春节期间一场突如其来的新型冠状病毒,一时间笼罩了整个武汉(湖北),又因为春运人类大迁徙,近年中国经济快速发展,交通更加便利,就业选择率提升,短时间全国沦陷无一省份避免,还走出了国门。经济飞速发展,我们其它方面准备好了吗?

今年我是农村老家过的,年三十村长宣布上级要求:一,今年病毒防控不让初一串门拜年磕头了,我是一阵欣喜, 主要是回家一共几天,几乎每天都在给磕头,因为这个一说过年回家就一细焦虑。但乡村习惯还是一时改不过来,初一天还没亮被拉起来出去磕了一圈几十个头;二、本市空气最差(大伙一猜也知道是哪),全不让燃放烟花爆竹,宣传是出动警察和无人机侦查抓燃放烟花的,三十晚上还是法不责众,但过年这几天的燃放是往年的10分之1不到。 初三开始一级防控停止了公共交通,农村封路,本打算初四回城防止后面真回不去了,耽误了工作,但是初三下午孩子就开始发烧测量 38度2.  神经变的紧张,但是我们没有与外地人接触史,村里的医疗条件还没有外面一个小药店,医疗水平不行还没什么药,又遇封路还有病毒这个大环境,去县城太远又怕交叉感染,最后从田间小路串村,给邻居村村口管制人员说好话才过去,村医看到都一脸吃惊,第一句是“你们是怎么过来的?”,而不是“哪里不舒服?” , 没法测血更没拍片机器,听了一下,问了哪外面哪回来的,身边有没有人有其它症状,其他没多说啥,一量39度5,说先退烧吧,拿了2天的药一包抗生素,一包不调制的白药面儿,打发回了家。  回来也时时刻刻关注着症状,网上查如果冠状病毒的表现,当天就和其他家人隔离了,分开吃饭,退了烧,第三天又反复了一次开始咳嗽,流鼻涕,但精神好还能玩,猜测流感的可能性大,赶紧又跑了一趟邻村村医, 还是那句“你们又怎么过来的?村口没人吗?让你们进来?” 呵呵,无奈的村医,你怎么想的我们都知道, 这次打了一针,换了些药,我说根据这两年孩子生病症状,应该是病毒导致的支气管炎,对吗?村医点头,我问可以雾化吗? 村医说可以,要做吗? 我:要。 看见了雾化器,还没有我自己之前花了1000多买的雾化器好, 不过对于家里孩子如果经常支气管有问题的,雾化确实效果很好。 孩子极不配合的做完雾化回家,按时吃了2天的药,明显好转了,感谢那位村医。

但是这两天我又开始有感冒的症状了,头疼,流鼻涕,不咳嗽,还好不发烧,准备自己抗一下。

我一直有个疑问,为什么现在科技公司说自己已达到世界一流水平,各种AI, 深度学习,机器学习,神经网络,大数据,云计算,甚至人类维修中心都不远,而医学上没有这么快的进步?演算疾病、预防、快速诊断、缩短城乡差距医疗水平, 不可以一滴血化验快速生成AWR报告吗?半年前和省医大医生吃饭是聊起此事, 他们说其实医学太难了,不同人的免疫反映不同,同一病同一药可能在不同人身上就有不同的反应,而且都是online, 不允许出错(现在的医患关系,搞不好还可能被病人家属抹脖子),没有真实测试环境, 但是已经有科技公司在做利用技术提高医学水平, 我也有个从医情怀,觉得医务工作者减少疾病给人带来的痛苦,给家庭和社会带来的贡献,远比其他任何行业更有成就感,更应该值得尊重。

谁才是这个星球主人

微生物细菌存在几十亿年,我们人类的存在才一万多年。 而人类只是众多生物中的一小部分子,要懂得尊重大自然,顺应大自然生物的存在及给予的一切,维护整个社会生态的平衡,人类过度开采,导致一些动物的家园被破坏,进入到人类居住区,留下了它们身上特有的如病毒, 甚至有些胆大的人盲目食用野生动物,有些病毒本该随着宿主的死亡而终止,结果被人类食用,病毒会选择一个更佳生存的环境,可能会在人体上变异,而导致一场未知的传染病发生,通过非典,埃博拉,禽流感,新型冠状病毒,应该让人类惊醒, 谁才是这个星球的主人,不要破坏这个生态。

症状

感冒:流涕,鼻塞,中低发热持续1-2天,病程3天,乏力症状不明显;

流感:一天内高热39以上,头痛,乏力,清鼻涕,咳嗽带痰,发热3-5天,病程1周

新型冠状病毒:低热到高热,干咳,乏力,腹泻,呼吸困难, 流鼻涕少见

血常规检查(来自互联网)

  1. 白细胞

白细胞高,可能是细菌感染引起炎症

淋巴细胞增高,提示是病毒感染

中性粒细胞增加,提示是细菌感染

  1. 红细胞

血红蛋白, 判断是否缺血

红细胞平均体积,判断是否缺铁和维生素B12和叶酸

  1. 血小板

血小板偏低,凝血功能出现障碍

  1. C 反映蛋白(CRP)

CRP是一咱验证指标,高于50mg/dL 时,细菌感染可能性大

  1. 新型冠状病毒的早期血象

白细胞计数正常,中性粒细胞偏高,而淋巴细胞显著降低

相关科普

轻症疑似2019新型冠状病毒患者以及密切接触人员的如何进行自我管理 | 科普小课堂[41] · 协和呼吸

关于新冠肺炎的一切|回形针

反思

这场浩劫过后是否人类还对社会生态造成破坏? 是蝙蝠的错?还是人之过?闲的蛋疼吃蝙蝠,想TMD 挂着睡?

信息的透明,红十,监管,不要捐献再被污染

非典的历史经验是否总结,在这次战役中借鉴,不要让粪便中存在病毒而是外国首例中发现。

宣传是否到位?

国家公共医疗储备是否满足,应对这样的传染病?

要职上的人是否存在名不副实,搞什么“ 双黄连仙子”

How to diag redo/archive log generation growth?(降低redo生成量)

$
0
0

redo中记录所有数据库的变化,包括所有数据文件上的变化,但不包含控制文件和参数文件的变化。redo最初记录在online redo logfile中,如果是归档模式会在填充满后生成离线的归档日志文件。有固定块大小的block组成redo logfile, 大小是在创建时指定,默认如linux和solarios为512bytes, hpux ia中为1024 bytes, 每个redo logfile都有固定的标准的数据文件头,记录了dbname ,thread,Compatibility Version, time, SCN等, 然后就是有redo header和redo record组成的redo block逻辑结构,一个redo block可以包含多个redo record,一个redo record又有一个多个Change Vectors组成,具体内容可以dump logfile查看具体trace. 当然LGWR并不是写满所有的redo block,由于LGWR内部原理设计会在写入后跳过未满的block,导致部分空间浪费,可以参考redo wastage系统统计信息计算,通常较大的redo浪费是有过频繁的commit导致。

有时会发现归档量突然某一天突增了需要查询原因, 更有甚者把降低redo生产量做为优化的目标。这里记录一些分析归档生成和分析变化的思路。

显示每天的归档生成/switch 次数

-- log_switch.sql --
set linesize 200 pagesize 1000
column day format a3
column total format 9999
column h00 format 999
column h01 format 999
column h02 format 999
column h03 format 999
column h04 format 999
column h04 format 999
column h05 format 999
column h06 format 999
column h07 format 999
column h08 format 999
column h09 format 999
column h10 format 999
column h11 format 999
column h12 format 999
column h13 format 999
column h14 format 999
column h15 format 999
column h16 format 999
column h17 format 999
column h18 format 999
column h19 format 999
column h20 format 999
column h21 format 999
column h22 format 999
column h23 format 999
column h24 format 999
break on report
compute max of "total" on report
compute max of "h01" on report
compute max of "h02" on report
compute max of "h03" on report
compute max of "h04" on report
compute max of "h05" on report
compute max of "h06" on report
compute max of "h07" on report
compute max of "h08" on report
compute max of "h09" on report
compute max of "h10" on report
compute max of "h11" on report
compute max of "h12" on report
compute max of "h13" on report
compute max of "h14" on report
compute max of "h15" on report
compute max of "h16" on report
compute max of "h17" on report
compute max of "h18" on report
compute max of "h19" on report
compute max of "h20" on report
compute max of "h21" on report
compute max of "h22" on report
compute max of "h23" on report
compute sum of NUM on report
compute sum of GB on report
compute sum of MB on report
compute sum of KB on report

REM Script to Report the Redo Log Switch History


alter session set nls_date_format='DD MON YYYY';
select thread#, trunc(first_time) as "date", to_char(first_time,'Dy') as "Day", count(1) as "total",
sum(decode(to_char(first_time,'HH24'),'00',1,0)) as "h00",
sum(decode(to_char(first_time,'HH24'),'01',1,0)) as "h01",
sum(decode(to_char(first_time,'HH24'),'02',1,0)) as "h02",
sum(decode(to_char(first_time,'HH24'),'03',1,0)) as "h03",
sum(decode(to_char(first_time,'HH24'),'04',1,0)) as "h04",
sum(decode(to_char(first_time,'HH24'),'05',1,0)) as "h05",
sum(decode(to_char(first_time,'HH24'),'06',1,0)) as "h06",
sum(decode(to_char(first_time,'HH24'),'07',1,0)) as "h07",
sum(decode(to_char(first_time,'HH24'),'08',1,0)) as "h08",
sum(decode(to_char(first_time,'HH24'),'09',1,0)) as "h09",
sum(decode(to_char(first_time,'HH24'),'10',1,0)) as "h10",
sum(decode(to_char(first_time,'HH24'),'11',1,0)) as "h11",
sum(decode(to_char(first_time,'HH24'),'12',1,0)) as "h12",
sum(decode(to_char(first_time,'HH24'),'13',1,0)) as "h13",
sum(decode(to_char(first_time,'HH24'),'14',1,0)) as "h14",
sum(decode(to_char(first_time,'HH24'),'15',1,0)) as "h15",
sum(decode(to_char(first_time,'HH24'),'16',1,0)) as "h16",
sum(decode(to_char(first_time,'HH24'),'17',1,0)) as "h17",
sum(decode(to_char(first_time,'HH24'),'18',1,0)) as "h18",
sum(decode(to_char(first_time,'HH24'),'19',1,0)) as "h19",
sum(decode(to_char(first_time,'HH24'),'20',1,0)) as "h20",
sum(decode(to_char(first_time,'HH24'),'21',1,0)) as "h21",
sum(decode(to_char(first_time,'HH24'),'22',1,0)) as "h22",
sum(decode(to_char(first_time,'HH24'),'23',1,0)) as "h23"
from
v$archived_log
where first_time > trunc(sysdate-10)
group by thread#, trunc(first_time), to_char(first_time, 'Dy') order by 2,1;

REM Script to calculate the archive log size generated per day for each Instances.

select THREAD#, trunc(first_time) as "DATE"
, count(1) num
, trunc(sum(blocks*block_size)/1024/1024/1024) as GB
, trunc(sum(blocks*block_size)/1024/1024) as MB
, sum(blocks*block_size)/1024 as KB
from v$archived_log
where first_time > trunc(sysdate-10)
group by thread#, trunc(first_time)
order by 2,1
;

redo突然增长变化的原因

  • 业务量变化 DML增加或块变化增加
  • 改变了列长度或数据填充长度
  • 表类型变化,如TEMP
  • 增加新表
  • 启动了数据库附加日志
  • 创建了新的索引
  • 过度commit,产生的浪费
  • redo group block size改变
  • 改变了循环、批量修改SQL的方式

 

分析日志的一些方法

1, logminer(注:没有启用附件日志不会看到IOT)
2, v$sess_io 的block_changes 和v$session 关连
3, v$transaction 中undo 生成量USED_UBLK and USED_UREC
4, AWR中的(dba_hist_seg_stat) block change和gets/executions
5,DBA_TAB_MODIFICATIONS中的DML量

 

如何降低redo生成

1, nologging
– direct load (SQL*Loader)
– direct-load INSERT
– CREATE TABLE … AS SELECT
– CREATE INDEX
– ALTER TABLE … MOVE PARTITION
– ALTER TABLE … SPLIT PARTITION
– ALTER INDEX … SPLIT PARTITION
– ALTER INDEX … REBUILD
– ALTER INDEX … REBUILD PARTITION
– INSERT, UPDATE, and DELETE on LOBs in NOCACHE NOLOGGING mode stored out of line
2, 删除不需要的索引
3,大数据加载时删除索引,加载后再重建
4, 整理碎片,更小的表块,更小的redo
5, Recreate some tables as IOTs
6, Increase the cache of the sequences.
7, 使用全局临时表
8,减少MVIEW刷新频率
9, 减少不必要的更新
10,使用更小的列数据类型精度
11,少用char

 

相关MOS note

Document:832504.1 – Excessive Archives / Redo Logs Generation due to AWR / ASH – Troubleshooting
Document:167492.1 – How to Find Sessions Generating Lots of Redo
Document:300395.1 – How To Determine The Cause Of Lots Of Redo Generation Using LogMiner
Document:199298.1 – Diagnosing excessive redo generation
Document:69739.1 – How to Turn Archiving ON and OFF in Oracle RDBMS
Document: 188691.1 : How to Avoid Generation of Redolog Entries

oracle fast split partition

$
0
0

当拆分一个(partition)分区为两个分区时,其中一个分区为空,另一个非空分区保持了与原来分区相同的存储属性时,因为未产生数据移动,只通过内部切换data_object_id的内部调用,同时保证原来的Global 和 partition 索引一直处于USABLE(可用)状态,该特性叫做fast split partition. 这是9.2开始的老特性,IOT类型是从10.2 开始支持。

除了拆分后其中一个分区为空,还有要求统计信息准确,不只是拆分的分区上,同要也要求相应的索引重要是分区键相关,因为在拆分时会执行2个递归查询,分别以<分区值和>=分区值,确保拆分后的分区有一个分区为空。所以分区键上最好有索引,否则这个查询也可能会耗很长时间, 不要以为有rownum<2就会很快,如以前这个案例
如:

ALTER TABLE largetab SPLIT PARTITION p5000000 AT (900000)
INTO
(
PARTITION p900000,
PARTITION p5000000
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.02 0.02 1 36 79 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.03 0.03 1 36 79 0

select /*+ FIRST_ROWS(1) PARALLEL("LARGETAB", 1) */ 1
from
NO_CROSS_CONTAINER("ANBOB"."LARGETAB") PARTITION ("P5000000") where ( ( (
( "ID" < 900000 ) ) ) ) and rownum < 2

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 10 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 3 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 13 0 0

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 106 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 COUNT STOPKEY (cr=3 pr=0 pw=0 time=31 us starts=1)
0 0 0 PARTITION RANGE SINGLE PARTITION: 3 3 (cr=3 pr=0 pw=0 time=26 us starts=1 cost=0 size=13 card=1)
0 0 0 INDEX RANGE SCAN IDX_LARGETAB PARTITION: 3 3 (cr=3 pr=0 pw=0 time=21 us starts=1 cost=0 size=13 card=1)(object id 74334)

select /*+ FIRST_ROWS(1) PARALLEL("LARGETAB", 1) */ 1
from
NO_CROSS_CONTAINER("ANBOB"."LARGETAB") PARTITION ("P5000000") where ( ( (
( "ID" >= 900000 ) ) ) ) and rownum < 2

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 10 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 13 0 1

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 106 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 COUNT STOPKEY (cr=3 pr=0 pw=0 time=183 us starts=1)
1 1 1 PARTITION RANGE SINGLE PARTITION: 3 3 (cr=3 pr=0 pw=0 time=179 us starts=1 cost=1 size=10393357 card=799489)
1 1 1 INDEX RANGE SCAN IDX_LARGETAB PARTITION: 3 3 (cr=3 pr=0 pw=0 time=172 us starts=1 cost=1 size=10393357 card=799489)(object id 74334)

诊断fast split partition的event

ALTER SESSION SET tracefile_identifier='MYTRACE';
ALTER SESSION SET MAX_DUMP_FILE_SIZE = unlimited;
ALTER SESSION SET events '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
alter session set events '14525 trace name context forever, level 2';
run the split operation here

Note that starting from 12.1 there is no segment created for a resulting partition that is empty.


Oracle DUL支持Oracle 20c

$
0
0

之前测试过《DUL 支持Oracle 19c》,目前ORACLE 20C官方文档已发布, 按惯例2020年第一季度会发布ON cloud平台版本和工程系统,第二季度会发布可下载非工程系统版本,我先尝尝鲜搞个测试版本使用DUL测试是否继续支持20c,包括blockchain table.

[oracle@anbob19 ~]$ . 20cenv
[oracle@anbob19 ~]$ ora

SQL*Plus: Release 20.0.0.0.0 - Production on Wed Feb 19 21:24:17 2020
Version 20.2.0.0.0
Copyright (c) 1982, 2020, Oracle.  All rights reserved.
SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 ORCLPDB                        MOUNTED

SQL> alter pluggable database orclpdb open;
Pluggable database altered.

SQL> alter session set container=orclpdb;
Session altered.

SQL> create user anbob identified by anbob;
User created.

SQL> grant create session,create table to anbob;
Grant succeeded.

SQL> alter user anbob quota unlimited on users;
User altered.

SQL> create table anbob.tobj as select * from dba_objects;
Table created.

SQL> create blockchain table bc_tab1(id number,name varchar2(10), price number (                                                                      6,2))
  2  no drop until 31 days idle
  3  no delete locked
  4  hashing using "SHA2_512" version "v1.0";
create blockchain table bc_tab1(id number,name varchar2(10), price number (6,2))
*
ERROR at line 1:
ORA-05716: unsupported hashing algorithm V1.0

SQL> create blockchain table anbob.bc_tab1(id number,name varchar2(10), price number (                                                                      6,2))
  2  no drop until 31 days idle
  3  no delete locked
  4  hashing using "SHA2_512" version "v1";

Table created.

SQL> insert into anbob.bc_tab1 values(1,'anbob',100.88);
1 row created.

SQL> insert into anbob.bc_tab1 values(2,'weejar',100.88);
1 row created.

SQL> commit;
Commit complete.

SQL> update anbob.bc_tab1 set id=id+1;
update anbob.bc_tab1 set id=id+1
             *
ERROR at line 1:
ORA-05715: operation not allowed on the blockchain table

SQL> alter system flush buffer_cache;
System altered.

— DUL

[oracle@anbob19 tools]$ ./dul

Data UnLoader: 12.0.0.0.3 - Internal Only - on Wed Feb 19 21:41:45 2020
with 64-bit io functions and the decompression option

Copyright (c) 1994 2019 Bernard van Duijnen All rights reserved.

 Strictly Oracle Internal Use Only


Within one week you will need a more recent DUL version for this os
DUL: Warning: Recreating file "dul.log"
DUL: Warning: ulimit process stack size is only 33554432
Found db_id = 4226385268
Found db_name = ANBOB20C
DUL> show datafiles;
ts# rf# start   blocks offs open  err file name
  0   1     0    35841    0    1    0 /u01/app/oracle/oradata/ANBOB20C/orclpdb/system01.dbf                                                                 
  1   4     0    46081    0    1    0 /u01/app/oracle/oradata/ANBOB20C/orclpdb/sysaux01.dbf                                                                      
  2   9     0    12801    0    1    0 /u01/app/oracle/oradata/ANBOB20C/orclpdb/undotbs01.dbf                                                                      
  5  12     0     1921    0    1    0 /u01/app/oracle/oradata/ANBOB20C/orclpdb/users01.dbf
                                                                      
DUL> bootstrap;

DUL> desc anbob.tobj
  2  ;
Table ANBOB.TOBJ
obj#= 74578, dataobj#= 74578, ts#= 5, file#= 12, block#=130
      tab#= 0, segcols= 27, clucols= 0
Column information:
icol# 01 segcol# 01        OWNER len  128 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 02 segcol# 02  OBJECT_NAME len  128 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 03 segcol# 03 SUBOBJECT_NAME len  128 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 04 segcol# 04    OBJECT_ID len   22 type  2 NUMBER(0)
icol# 05 segcol# 05 DATA_OBJECT_ID len   22 type  2 NUMBER(0)
icol# 06 segcol# 06  OBJECT_TYPE len   23 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 07 segcol# 07      CREATED len    7 type 12 DATE
icol# 08 segcol# 08 LAST_DDL_TIME len    7 type 12 DATE
icol# 09 segcol# 09    TIMESTAMP len   19 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 10 segcol# 10       STATUS len    7 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 11 segcol# 11    TEMPORARY len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 12 segcol# 12    GENERATED len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 13 segcol# 13    SECONDARY len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 14 segcol# 14    NAMESPACE len   22 type  2 NUMBER(0)
icol# 15 segcol# 15 EDITION_NAME len  128 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 16 segcol# 16      SHARING len   18 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 17 segcol# 17  EDITIONABLE len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 18 segcol# 18 ORACLE_MAINTAINED len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 19 segcol# 19  APPLICATION len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 20 segcol# 20 DEFAULT_COLLATION len  100 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 21 segcol# 21   DUPLICATED len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 22 segcol# 22      SHARDED len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 23 segcol# 23 IMPORTED_OBJECT len    1 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 24 segcol# 24 CREATED_APPID len   22 type  2 NUMBER(0)
icol# 25 segcol# 25 CREATED_VSNID len   22 type  2 NUMBER(0)
icol# 26 segcol# 26 MODIFIED_APPID len   22 type  2 NUMBER(0)
icol# 27 segcol# 27 MODIFIED_VSNID len   22 type  2 NUMBER(0)

DUL> desc anbob.bc_tab1;

Table ANBOB.BC_TAB1
obj#= 74580, dataobj#= 74580, ts#= 5, file#= 12, block#=1794
      tab#= 0, segcols= 13, clucols= 0
Column information:
icol# 01 segcol# 01           ID len   22 type  2 NUMBER(0)
icol# 02 segcol# 02         NAME len   10 type  1 VARCHAR2 cs 873(AL32UTF8)
icol# 03 segcol# 03        PRICE len   22 type  2 NUMBER(6,2)
icol# 04 segcol# 04 ORABCTAB_INST_ID$ len   22 type  2 NUMBER(0)
icol# 05 segcol# 05 ORABCTAB_CHAIN_ID$ len   22 type  2 NUMBER(0)
icol# 06 segcol# 06 ORABCTAB_SEQ_NUM$ len   22 type  2 NUMBER(0)
icol# 07 segcol# 07 ORABCTAB_CREATION_TIME$ len   13 type 181 TIMESTAMP(9) WITH TIME ZONE
icol# 08 segcol# 08 ORABCTAB_USER_NUMBER$ len   22 type  2 NUMBER(0)
icol# 09 segcol# 09 ORABCTAB_HASH$ len 2000 type 23 RAW
icol# 10 segcol# 10 ORABCTAB_SIGNATURE$ len 2000 type 23 RAW
icol# 11 segcol# 11 ORABCTAB_SIGNATURE_ALG$ len   22 type  2 NUMBER(0)
icol# 12 segcol# 12 ORABCTAB_SIGNATURE_CERT$ len   16 type 23 RAW
icol# 13 segcol# 13 ORABCTAB_SPARE$ len 2000 type 23 RAW

DUL> unload table anbob.tobj;
. unloading table                      TOBJ   73977 rows unloaded
DUL> unload table anbob.bc_tab1;
. unloading table                   BC_TAB1       2 rows unloaded

[oracle@anbob19 tools]$ cat ANBOB_BC_TAB1.dat
|1| |anbob| |100.88| |1| |31| |1| |19-FEB-2020 AD 13:33:45.852176000| |0| |1688D135A82CDDB17B470A9A016A8BF5F8D5C22DDDA8EA64715FEA7DD2EEFD1EB22DCA4E6AD762254F5D46B4AEC6080C38C5E10404EA601C84F92CC2EDAB637E| || || || ||
|2| |weejar| |100.88| |1| |31| |2| |19-FEB-2020 AD 13:33:45.863239000| |0| |623A61C11E71DC681F55CCD5CE89E77044988E536461392F7B43E1581F0392A46A90018E623C7AA237C39A7F51B99E70AD0217F5242283B058C5B1247E529171| || || || ||

dul还是继续支持Oracle 20c的,只是Blockchain Table的隐藏列也都unload 出来,当然恢复剩下的就简单了。

12c wait library cache lock self-deadlock when compile EDITIONABLE Procedure

$
0
0

前段时间遇到的一个案例,当编译一个invalid procedure时,自已会话堵塞自己等待’library cache lock’. 数据库版本Oracle 12.2, 当然这个procedure里面用到了dblink 嵌套procedure跨了3个数据库,在查看procedures定义时发现附加了”EDITIONABLE”, EDITIONABLE在11.2引入,刚发现在12c开始EDITIONABLE成为了默认, EDITIONABLE是指可以在数据库中为一个对象创建多个版本, 如view, synonym, PL/SQL object , Tim Hall’s notes 介绍过可以参考。这里只是简单记录一下处理方法。

SQL> @s 4878

    SID SQLID_AND_CHILD      STATUS   STATE   EVENT                                          SEQ# SEC_IN_WAIT BLOCKING_SID P1                 P2                 P3                 P1TRANSL
------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ ------------------ ------------------ ------------------------------------------
   4878 dc26hs56954yy 0      ACTIVE   WAITING library cache lock                              958         292 4878         handle address=    lock address=      100*mode+namespace
                                                                                                                           0x000000033CB97E18 0x000000021ED50468 = 65539

SQL> @usid 4878    

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID             OPID CPID                     SQL_ID           HASH_VALUE   LASTCALL STATUS   SADDR            PADDR            TADDR            LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ --------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -----------------
SYS                      '4878,8981'    4294967295 oracle           anbob2             (TNS V1-V3)          75954             112 75953                    dc26hs56954yy    1284674526         44 ACTIVE   0000000404159630 00000004234D9CC0 000000043D733238 20191231 09:39:36

SQL> oradebug setorapid 112
Oracle pid: 112, Unix process pid: 75954, image: oracle@anbob2 (TNS V1-V3)

SQL> oradebug dump errorstack 1;
Statement processed.

SQL> oradebug short_stack;
ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-__poll()+16<-ipcgxp_selectex()+409<-ipclw_wait()+1045<-ksxpwait_ipclw()+3844<-ksxpwait_int()+22103<-ksxpwait()+845<-ksliwat()+10782<-kslwaitctx()+197<-kjusuc()+9058<-ksipgetctxia()+5359<-ksixpgetia()+167<-kqlmLock()+3201<-kqlmClusterLock()+209<-kgllkal()+3424<-kglLock()+1307<-kglget()+290<-kkdllk0()+427<-kqlrde()+1804<-kglrdi()+470<-kglrlo()+1016<-kqlrld()+3812<-kqlrldop()+121<-kqlLoadRemoteObject()+615<-kqllod()+242<-kglobld()+1080<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kglgob()+545<-kgldpo0()+673<-kgldpo()+89<-kgldon()+280<-pkldon()+94<-pkloud()+278<-phnnrl_name_resolve_by_loading()+3939<-phngdl_get_defining_libunit()+155<-phnrpls_resolve_prefix_libscope()+224<-phnrp_resolve_prefix()+138<-phnr_resolve()+224<-ph2o_get_cands()+343<-ph2o_overload_diana()+265<-ph2stm()+12894<-ph2sms()+243<-ph2blo()+539<-ph2obl()+111<-ph2uni()+4595<-ph2dr2()+338<-ph2drv()+304<-phpsem()+62<-phpcmp()+1543<-pcicmp0()+468<-kkxcmp0()+976<-rpiswu2()+627<-kkxcmp()+258<-kkpalt()+2564<-opiexe()+22930<-opiosq0()+4766<-opiodr()+1229<-rpidrus()+201<-skgmstack()+65<-rpidru()+134<-rpiswu2()+627<-rpidrv()+1540<-rpisplu_internal()+471<-kqlvld()+4104<-kglgob()+2737<-kkdlLoadDDL()+3444<-qcdlgbo()+8243<-qcdlgob()+1005<-qcsfgob()+290<-qcsprfro()+531<-qcsprfro_tree()+380<-qcsprfro_tree()+150<-qcspafq()+246<-qcspqbDescendents()+278<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1978<-opiprs()+333<-kksParseChildCursor()+541<-rpiswu2()+627<-kksLoadChild()+5470<-kxsGetRuntimeLock()+2035<-kksfbc()+15083<-kkspsc0()+2130<-kksParseCursor()+123<-opiosq0()+2391<-kpooprx()+404<-kpoal8()+850<-opiodr()+1229<-ttcpip()+1257<-opitsk()+1940<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262<-__libc_start_main()+245 

SQL>  oradebug short_stack;
ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-__poll()+16<-ipcgxp_selectex()+409<-ipclw_wait()+1045<-ksxpwait_ipclw()+3844<-ksxpwait_int()+22103<-ksxpwait()+845<-ksliwat()+10782<-kslwaitctx()+197<-kjusuc()+9058<-ksipgetctxia()+5359<-ksixpgetia()+167<-kqlmLock()+3201<-kqlmClusterLock()+209<-kgllkal()+3424<-kglLock()+1307<-kglget()+290<-kkdllk0()+427<-kqlrde()+1804<-kglrdi()+470<-kglrlo()+1016<-kqlrld()+3812<-kqlrldop()+121<-kqlLoadRemoteObject()+615<-kqllod()+242<-kglobld()+1080<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kglgob()+545<-kgldpo0()+673<-kgldpo()+89<-kgldon()+280<-pkldon()+94<-pkloud()+278<-phnnrl_name_resolve_by_loading()+3939<-phngdl_get_defining_libunit()+155<-phnrpls_resolve_prefix_libscope()+224<-phnrp_resolve_prefix()+138<-phnr_resolve()+224<-ph2o_get_cands()+343<-ph2o_overload_diana()+265<-ph2stm()+12894<-ph2sms()+243<-ph2blo()+539<-ph2obl()+111<-ph2uni()+4595<-ph2dr2()+338<-ph2drv()+304<-phpsem()+62<-phpcmp()+1543<-pcicmp0()+468<-kkxcmp0()+976<-rpiswu2()+627<-kkxcmp()+258<-kkpalt()+2564<-opiexe()+22930<-opiosq0()+4766<-opiodr()+1229<-rpidrus()+201<-skgmstack()+65<-rpidru()+134<-rpiswu2()+627<-rpidrv()+1540<-rpisplu_internal()+471<-kqlvld()+4104<-kglgob()+2737<-kkdlLoadDDL()+3444<-qcdlgbo()+8243<-qcdlgob()+1005<-qcsfgob()+290<-qcsprfro()+531<-qcsprfro_tree()+380<-qcsprfro_tree()+150<-qcspafq()+246<-qcspqbDescendents()+278<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1978<-opiprs()+333<-kksParseChildCursor()+541<-rpiswu2()+627<-kksLoadChild()+5470<-kxsGetRuntimeLock()+2035<-kksfbc()+15083<-kkspsc0()+2130<-kksParseCursor()+123<-opiosq0()+2391<-kpooprx()+404<-kpoal8()+850<-opiodr()+1229<-ttcpip()+1257<-opitsk()+1940<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262<-__libc_start_main()+245

-- dia trace (HM) 中也有记录
*** 2019-12-31T10:04:41.401172+08:00
HM: Session with ID 4878 serial # 8981 (FG) on read/write instance 2 is hung
    and is waiting on 'library cache lock' for 96 seconds.
    Session was previously waiting on 'SQL*Net message from dblink'.
    Final Blocker is Session ID 4878 serial# 8981 on instance 2
     which is waiting on 'library cache lock' for 96 seconds
     p1: 'handle address'=0x33cb97e18, p2: 'lock address'=0x21ed50468, p3: '100*mode+namespace'=0x10003

*** 2019-12-31T10:15:10.176899+08:00
HM: Short Stack of self-deadlocked session ID 4878, OSPID 75954 of hang ID 12
Short stack dump:
ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-__poll()+16<-ipcgxp_selectex()+409<-ipclw_wait()+1045<-ksxpwait_ipclw()+3844<-ksxpwait_int()+22103<-ksxpwait()+845<-ksliwat()+10782<-kslwaitctx()+197<-kjusuc()+9058<-ksipgetctxia()+5359<-ksixpgetia()+167<-kqlmLock()+3201<-kqlmClusterLock()+209<-kgllkal()+3424<-kglLock()+1307<-kglget()+290<-kkdllk0()+427<-kqlrde()+1804<-kglrdi()+470<-kglrlo()+1016<-kqlrld()+3812<-kqlrldop()+121<-kqlLoadRemoteObject()+615<-kqllod()+242<-kglobld()+1080<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kglgob()+545<-kgldpo0()+673<-kgldpo()+89<-kgldon()+280<-pkldon()+94<-pkloud()+278<-phnnrl_name_resolve_by_loading()+3939<-phngdl_get_defining_libunit()+155<-phnrpls_resolve_prefix_libscope()+224<-phnrp_resolve_prefix()+138<-phnr_resolve()+224<-ph2o_get_cands()+343<-ph2o_overload_diana()+265<-ph2stm()+12894<-ph2sms()+243<-ph2blo()+539<-ph2obl()+111<-ph2uni()+4595<-ph2dr2()+338<-ph2drv()+304<-phpsem()+62<-phpcmp()+1543<-pcicmp0()+468<-kkxcmp0()+976<-rpiswu2()+627<-kkxcmp()+258<-kkpalt()+2564<-opiexe()+22930<-opiosq0()+4766<-opiodr()+1229<-rpidrus()+201<-skgmstack()+65<-rpidru()+134<-rpiswu2()+627<-rpidrv()+1540<-rpisplu_internal()+471<-kqlvld()+4104<-kglgob()+2737<-kkdlLoadDDL()+3444<-qcdlgbo()+8243<-qcdlgob()+1005<-qcsfgob()+290<-qcsprfro()+531<-qcsprfro_tree()+380<-qcsprfro_tree()+150<-qcspafq()+246<-qcspqbDescendents()+278<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1978<-opiprs()+333<-kksParseChildCursor()+541<-rpiswu2()+627<-kksLoadChild()+5470<-kxsGetRuntimeLock()+2035<-kksfbc()+15083<-kkspsc0()+2130<-kksParseCursor()+123<-opiosq0()+2391<-kpooprx()+404<-kpoal8()+850<-opiodr()+1229<-ttcpip()+1257<-opitsk()+1940<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262<-__libc_start_main()+245

-- 函数调用 --
ipcgxp_selectex()+409                          inter process calls  [partial hit for: ipc ] 
ipclw_wait()+1045                              inter process calls lightweight (exafusion)  [partial hit for: ipclw ] 
ksxpwait_ipclw()+3844                          kernel service  cross instance cross instance ipc  [partial hit for: ksxp ] 
ksxpwait_int()+22103                           kernel service  cross instance cross instance ipc  [partial hit for: ksxp ] 
ksxpwait()+845                                 kernel service  cross instance cross instance ipc  [partial hit for: ksxp ] 
ksliwat()+10782                                kernel service  latching and post-wait inner wait function; setup a wait that times out 
kslwaitctx()+197                               kernel service  latching and post-wait wait for n centi-seconds or until posted wait context; wait until timeout 
kjusuc()+9058                                  kernel lock management global enqueue service synchronous open and convert a lock 
ksipgetctxia()+5359                            kernel service  instance locking get a group lock (synchronous interface to DLM for lock get)  [partial hit for: ksipget ] 
ksixpgetia()+167                               kernel service  instance locking  [partial hit for: ksi ] 
kqlmLock()+3201                                kernel query library cache multi-instance manager  [partial hit for: kqlm ] 
kqlmClusterLock()+209                          kernel query library cache multi-instance manager  [partial hit for: kqlm ] 
kgllkal()+3424                                 kernel generic library cache management library cache lock allocate 
kglLock()+1307                                 kernel generic library cache management library cache lock 
kglget()+290                                   kernel generic library cache management get a lock on an object 
kkdllk0()+427                                  kernel compile dictionary lookup lock an object  [partial hit for: kkdllk ] 
kqlrde()+1804                                  kernel query library cache remote  [partial hit for: kqlr ] 
kglrdi()+470                                   kernel generic library cache management  [partial hit for: kgl ] 
kglrlo()+1016                                  kernel generic library cache management  [partial hit for: kgl ] 
kqlrld()+3812                                  kernel query library cache remote load a remote library object 
kqlrldop()+121                                 kernel query library cache remote load a remote library object  [partial hit for: kqlrld ] 
kqlLoadRemoteObject()+615                      kernel query library cache  [partial hit for: kql ] 
kqllod()+242                                   kernel query library cache database object load 
kglobld()+1080                                 kernel generic library cache management object load 
kglobpn()+2375                                 kernel generic library cache management object pin heaps and load data pieces 
kglpim()+425                                   kernel generic library cache management pin and load more heaps 
kglpin()+1672                                  kernel generic library cache management pin heaps and load data pieces of an object 
kglgob()+545                                   kernel generic library cache management get an objected locked and pinned 
kgldpo0()+673                                  kernel generic library cache management depend on an object  [partial hit for: kgldpo ] 
kgldpo()+89                                    kernel generic library cache management depend on an object 
kgldon()+280                                   kernel generic library cache management depend on an object 
pkldon()+94                                    PLSQL KG interface  [partial hit for: pk ] 
pkloud()+278                                   PLSQL KG interface  [partial hit for: pk ] 
phnnrl_name_resolve_by_loading()+3939          PLSQL semantics  [partial hit for: phn ] 
phngdl_get_defining_libunit()+155              PLSQL semantics  [partial hit for: phn ] 
phnrpls_resolve_prefix_libscope()+224          PLSQL semantics  [partial hit for: phn ] 
phnrp_resolve_prefix()+138                     PLSQL semantics  [partial hit for: phn ] 
phnr_resolve()+224                             PLSQL semantics  [partial hit for: phn ] 
ph2o_get_cands()+343                           PLSQL phase 2 (semantic analyzer)  [partial hit for: ph2 ] 
ph2o_overload_diana()+265                      PLSQL phase 2 (semantic analyzer)  [partial hit for: ph2 ] 
ph2stm()+12894                                 PLSQL phase 2 (semantic analyzer) statement(?) 
ph2sms()+243                                   PLSQL phase 2 (semantic analyzer) process statements 
ph2blo()+539                                   PLSQL phase 2 (semantic analyzer) idl node D_BLOCK, D_DECL 

-- ctrl +c --
SQL> alter procedure  xxx.ap_rec_statxxxx compile;
^Calter procedure  xxx.ap_rec_statxxxx compile
*
ERROR at line 1:
ORA-04052: error occurred when looking up remote object xxx.AP_UPDATE_xxxx@db1xx
ORA-01013: user requested cancel of current operation


Solution:
开始以为是dblink问题,尝试半天最后发现drop procedure,重新创建就解决了。 仅记录未发现已知BUG.

Oracle 20C新特性一:Blockchain Tables(区块链表)

$
0
0

Oracle database是一个技术超融合类的产品,以致于现在的版本给人的印象是过于庞大,但是不可否认她还是传统数据库学习的方向标,几年前在引入机器学习ML/AI到传统数据库时就有人质疑,但是目前看来这也是一个前瞻性的决定,现实世界中的ADW/ATP取得了用户的认可,Oracle 20C版本同样又引入了新的特性,在China这个大背景下还追随ORACLE也许会让人不解,但是离开她你也许会错过很多。

Oracle Database 20c New feature:

    • BlockChain table type
    • Desupport of non-cdb architecture,The usage of Multitenant architecture will be mandatory, facilitating 3 PDB’s with no cost
    • Native Persistent Database Memory (PMEM) –
    • Automated Machine Learning
    • Binary Json Data type for more performance
    • Oracle Spatial and Graph
    • Lots of machine learning algorithms

20C新特性中首当其冲的就是blockchain tables区块链表,受益于区块链的防篡改安全可信机制,Oracle认为并非每个项目都需要区块链技术的全部功能,而引入其IT环境复杂区块链。区块链表使客户可以在需要高度防篡改的数据管理,而又无需在多个组织中分布分类帐或依靠分散式信任模型的情况下使用Oracle数据库。它们还可以用于改善数据完整性和对中央机构(例如托管公司)提供的服务的欺诈保护。

区块链表是一种新的专用表类型,它在Oracle数据库中提供了高度防篡改的持久性选项。它允许仅插入操作,不允许进行任何更新和其他修改,并限制删除操作。为了进一步提高抗篡改性,通过将前一行的哈希存储在当前行中来对行进行链接,从而使用户能够验证任何修改。用户还可以选择使用X.509证书使用基于PKI的签名对行的内容进行签名,并且可以验证签名和数据的完整性,从而确保不可否认性。

数据库内部攻击不仅最难以识别,而且盗用用户或DBA的凭据也可能为外部黑客打开大量关键数据。区块链表通过防止用户欺诈(有人重写历史记录,删除某些数据或对其进行加密以勒索赎金)来提供更强的数据库安全性。 配合ORACLE DV可以防止DBA操作,即使使用闪加数据库把数据库已回滚到前一个时间点,并且随后的插入被欺诈性条目替换,您都可以将哈希和PKI签名与外部审计日志进行比较,以验证数据完整性并检测任何篡改。

使用区块链表是一个易于使用但高度防篡改的集中式数据管理选项,可透明地支持现有应用程序,并使用户和开发人员能够利用丰富的Oracle开发工具和环境集,并且可以与其它非BLOCKCHAIN TABLE关连查询,使用区块链表比实现区块链网络和去中心化应用程序容易得多,不需要新的基础架构,该功能作为Oracle数据库的一部分提供。表的使用对于现有应用程序可以是透明的,并且开发人员可以使用SQL,PL / SQL,JDBC和其他方式访问表来保持当前的体系结构和编程模型。同样也得利于数据库的其它特性如分区、inmemory、备份、归档。虽然insert性能效率相比之前有所损耗,但是相对于分布式区块链提交共识加载率还是高效的。

对比blockchain table 和 normal table insert的速率(10几倍的差别)(测试环境是完全空闲)

create blockchain table anbob.bc_tab2(id number,name varchar2(10), price number (  6,2))
 no drop until 31 days idle
 no delete locked
 hashing using "SHA2_512" version "v1";

create table anbob.tab2  (id number,name varchar2(10), price number (  6,2));

SQL> set timing on
SQL> begin
for i in 1..10000 loop
 insert into anbob.bc_tab2 values(i,'anbob'||i,100);
 commit;
end loop;
 end;
 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:13.26

SQL> begin
for i in 1..10000 loop
 insert into anbob.tab2 values(i,'anbob'||i,100);
 commit;
end loop;
 end;
 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.86


为什么要使用blockchain tables?

尽管新兴的分布式应用程序受益于去中心化信任模型中,但当今大多数应用程序都具有中央权限(银行,代管公司,交易交易所,政府办公室等),并且借助Oracle数据库中的区块链表,可以使这些应用程序更安全,而无需增加更改为托管服务器的复杂性去中心化模型。这就是使用Oracle Database 20c本机区块链表的原因。例如,财务交易日志,医疗记录,审计跟踪,监管合规性数据,财务记录,临床试验的FDA报告,合法保留数据,集中的保管或来源信息链,可以应用区块链表来使这些应用程序更安全并且其数据防篡改。

数据库区块链表和去中心区块链比较:
区块链表和区块链平台的共同属性:

  1. 仅追加数据
  2. 使用加密散列进行链式篡改
  3. 数字签名的更新(在区块链表中是可选的,在区块链平台交易中是必需的)

附加blockchain table功能或属性:

  1. 内置数据库备份和归档,分区,访问控制
  2. 插入事务的吞吐量更高
  3. 使用SQL和PL / SQL的编程模型
  4. 除了Oracle数据库外,不需要任何新的基础架构

使用blockchain tables

Oracle blockchain tables提供了新的语法,上面我们已经创建了一张blockchain table. anbob.bc_tab2 , 常用的字段类型都是支持的,但不支持long, 还有只允许insert不允许其它DML 或drop table ,drop partition操作,不允许直接路径加载,不允许blockchain table 转换非blockchain table, 不允许OGG同步更新数据.

create blockchain table anbob.bc_tab2(id number,name varchar2(10), price number ( 6,2))
no drop until 31 days idle
no delete locked
hashing using “SHA2_512” version “v1”;

1, NO DROP [UNTIL n DAYS IDLE]是强制性子句,可防止删除表或在配置非活动保留期前不允许删除。n的最小值是16,要以使用alter table调大,不允许调小。

2,NO DELETE [LOCKED] or NO DELETE UNTIL n DAYS AFTER INSERT [LOCKED]是控制最新记录insret后允许删除或永远不允许删除的控制删除策略的子句,每一行中都有隐藏列timestamp 类型。对于真正的永久分类帐,可以禁止行删除,也可以在n天的保留期后允许删除行,其中n的最小值也为16。同样可调大,不可调小。

3,HASHING USING “sha2_512” VERSION “v1”,也是强制性子句,用于指定行链接的哈​​希计算使用具有512位输出长度的SHA2算法。

为了在PL / SQL中使用区块链表,我们提供了三个具有一系列功能的数据库包DBMS_BLOCKCHAIN_TABLE、DBMS_TABLE_DATA、DBMS_USER_CERTS, Blockchiain相关VIEW xx_BLOCKCHAIN_TABLES , xx_CERTIFICATES.

 

先记录这么多,20c虽然该特性很棒,但是ORACLE 19C仍然是当前(2020年)推荐升级的版本,后续在分享整理其它特性。

 

reference https://blogs.oracle.com/blockchain/native-blockchain-tables-extend-oracle-database%e2%80%99s-multi-model-converged-architecture

Troubleshooting long wait “enq: US – contention”&“enq: IV – contention” after DDL, alert show “libcache interrupt action by LCK”

$
0
0

一套12C R2版本的4节点RAC数据库,记录一起高并发业务繁忙时,DDL导致数据库出现大量前台会话长时间等待”enq: US – contention” & “enq: IV – contention” ,该SQL(INSRT VALUE)的只是一个Children cursor无法执行(sql_start_exec 时间已过去数小时),第二个的children cursor执行效率正常,最终KILL 这些长时间运行前端会话解决.

enq: IV – contention
IV Library Cache Invalidation Synchronizes library cache object invalidations across instances

enq: US – contention
US Undo Segment Lock held to perform DDL on the undo segment

db alert log

2020-01-08T09:18:46.175252+08:00
LCK1 (ospid: 62157) waits for event 'libcache interrupt action by LCK' for 72 secs.
2020-01-08T09:18:46.175342+08:00
LCK1 (ospid: 62157) is hung in an acceptable location (libcache 0x41.02).
2020-01-08T09:22:36.304180+08:00
LCK1 (ospid: 62157) waits for event 'libcache interrupt action by LCK' for 84 secs.
2020-01-08T09:22:36.304278+08:00
LCK1 (ospid: 62157) is hung in an acceptable location (libcache 0x41.02).
2020-01-08T09:28:58.571536+08:00
LCK1 (ospid: 62157) waits for event 'libcache interrupt action by LCK' for 74 secs.
2020-01-08T09:28:58.571626+08:00
LCK1 (ospid: 62157) is hung in an acceptable location (libcache 0x41.02).
2020-01-08T09:31:23.535829+08:00
LCK1 (ospid: 62157) waits for event 'libcache interrupt action by LCK' for 225 secs.
2020-01-08T09:31:23.535924+08:00
LCK1 (ospid: 62157) is hung in an acceptable location (libcache 0x41.02).
2020-01-08T09:33:55.656437+08:00
LCK1 (ospid: 62157) waits for event 'libcache interrupt action by LCK' for 64 secs.
2020-01-08T09:33:55.656529+08:00
LCK1 (ospid: 62157) is hung in an acceptable location (libcache 0x41.02).
2020-01-08T09:39:44.754781+08:00
LCK1 (ospid: 62157) waits for event 'libcache interrupt action by LCK' for 84 secs. 

什么是event ‘libcache interrupt action by LCK’
似乎显示LCKn进程正在处理中,未被另一个会话堵塞,这个等待事件不是真正的等待,而是在尝试在把lock处理为失效时给会话显示为这个等待event. 当出现这个问题时可能的原因有:
* A long list of locks to invalidate
* A large reference list for the objects to invalidate
* Some form of linked list corruption

在lck trace中有如下记录

*** 2020-01-08T02:06:27.966915+08:00
kqlmClusterMessage: sync lock contention.
kqlmClusterMessage: sync lock contention.
kqlmClusterMessage: sync lock contention.
kqlmClusterMessage: sync lock contention.
kqlmClusterMessage: sync lock contention.

等待FG会话的wait “enq: IV – contention”, call stack

short stack: ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-semtimedop()+10
<-skgpwwait()+200<-ksliwat()+2300<-kslwaitctx()+197<-kjuscv_wait()+2184<-kjuscv()+1778<-ksicon()+2008<-
kqlmClusterMessage()+1169<-kqrSendInvalidationMessage()+648<-kqrGetClusterLock()+3423<-kqrLockAndPinPo()
+1509<-kqrpre1()+3041<-ktuGetRowCache1()+84<-ktuVerifyUsnForCreate()+154<-ktuFindNextUsnForCrt()+651<-
ktuFindNextFreeUsn()+135<-ktuCreateUndoSegment()+1059<-ktusmcus()+202<-ktusmAddUndoSegment()+897<-
ktusmaus_add_us()+808<-ktubnd()+19002<-ktuchg2()+12560<-ktbchg2()+236<-kdtchg()+1204<-kdtwrp()+2522<-kdtInsRow()+659<

LCK1 进程WAIT “enq: IV – contention” ,call stack

ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-__poll()+16<-sskgxp_selectex()
+309<-skgxpiwait()+1587<-skgxpwaiti()+404<-skgxpwait()+176<-ksxpwait_int()+2930<-ksxpwait()+1664<-ksliwat()
+10782<-kslwaitctx()+197<-kjusuc()+9058<-ksigeti()+6782<-kqlmbinv()+192<-kqlmba()+502<-ksbcti()+247<-ksbabs()
+551<-ksbrdp()+1079<-opirip()+609<-opidrv()+602<-sou2o()+145<-opimai_real()+202<-ssthrdmain()+417<-main()

kslwaitctx()+197 kernel service latching and post-wait wait for n centi-seconds or until posted wait context; wait until timeout
kjusuc()+9058 kernel lock management global enqueue service synchronous open and convert a lock
ksigeti()+6782 kernel service instance locking [partial hit for: ksi ]
kqlmbinv()+192 kernel query library cache multi-instance manager [partial hit for: kqlm ]
kqlmba()+502 kernel query library cache multi-instance manager [partial hit for: kqlm ]
ksbcti()+247 kernel service background processes call timeout/interrupts
ksbabs()+551 kernel service background processes action based server

问题时建议做
Systemstate, hanganalye dump, 和ASH 持久化, undostat。

分析ASH信息
00:39 到 01:08分 主要等待为enq: TM – contention,1个实例有4000左右session等待,从DDL审计中可以查到对INSERT的相关表做了DDL
01:09 到 01:18      有出现过大量row cache lock应该为DDL过后DICT 争用
01:09 到 现在      主要等等为enq: US – contention,enq: IV – contention 并大量并发事务开始等待分配UNDO SEGMENTS和lock invalid

这时一个高并发业务的数据库,当业务运行时,对table做DDL,会导致TABLE 相关的对象在library cache lock中失效, 而在RAC环境中是lck进程负责local节点library cache的IV操作和通知其它节点lck, 之前一个案例中也出现过https://www.anbob.com/archives/2942.html , 这是一个12c r2的数据库,已知BUG中没有发现,以12.1 和11.2.0.4 中有个Bug 17052702 – LCK hang with ‘enq: IV – contention’ (Doc ID 2410520.1) .

12.2中启用CURSOR_INVALIDATION控制DDL后deferred cursor invalidation or immediate cursor invalidation ,默认是immediate。滚动失效的窗口是有_optimizer_invalidation_period参数控制,单位秒,默认5小时, 和12.2之前的版本在dbms_stats中使用一样,可以修改参数cursor_invalidation=deferred或对index DDL时加”DEFERRED INVALIDATION”滚动失效SQL children cursor, 但是并不是所有情况都可以rolling invalidation。

select sql_text,invalidations,loads,parse_calls,executions,first_load_time,last_load_time,last_active_time,is_rolling_invalid from v$sql where xxx

— 另外注意从12c 开始LCK 出现了2个进程。

Troubleshooting 11.2.0.4 show error ORA-12592, ORA-3137, ORA-3106

$
0
0

Network/TTC related error ORA-12592, ORA-3137, ORA-3106 may be signaled on SQL*Net TCP/IP transport.
Usually this problem is seen with following circumstances.
Sending large size data to database server, for example, using sqlldr, expdp
– Database version is 11.2.0.4

Rediscovery Information:
If you see ORA-12592, ORA-3137, ORA-3106 when large size data is passed via SQL*Net
using TCP/IP protocol, and no problem is seen on network itself, you might hit this problem.

Workaround
Set sqlnet.send_timeout to any value except 0 on a side which receives large data.
Simply setting this parameter on server and client is good.
Very large value acts as workaround well, thus if you want to workaround problem with
minimum change, setting large value is recommended.

For database version 11.2.0.4 and above, you may encounter this error during a SQL*Net TCP/IP transport.
To determine if you encountered this bug, check your alert log for
ORA-12592 or ORA-3106
and check your incident trace file for
‘SQL*Net TCP/IP’ or ‘TCP/IP’

If any of these exist, proceed with the solution below:
1.The fix for 18841764 will be included in 12.2 database release. This was not available at the time of writing.
2.Download and apply patch 18841764 if it is available for your platform.
3.Workaround the error by setting sqlnet.send_timeout to any value except 0 on the server and client.
In a previous incident setting it to 600 resolved the issue.

Note a few prerequisites:
1,  network no problem
2, send large size data Via SQL*Net
3, db alert show error and trace show TCP/IP

what’s SQL*Net?

SQL*Net (or Net8) is Oracle‘s networking software that allows remote data access between programs and the Oracle Database, or among multiple Oracle Databases. Applications and databases can be distributed physically to different machines and continue to communicate as if they were local.

Client configuration
The Oracle Client software is required on workstations and servers that need to connect to remote Oracle Databases.

Server configuration
The cornerstone of SQL*Net is a process called the listener.

Troubleshooting Active Dataguard Hangs waiting for library cache lock on DBINSTANCE namespace

$
0
0

Oracle 11.2.0.3 2nodes RAC on hpux环境, active dataguard端突然产生较大GAP,日志停止应用,大量前台查询进程等待”library cache load lock”, hanganalye 显示’rdbms ipc message'<=’library cache lock’ 等待链, library cache lock是一个parse lock,要知道在ADG 环境中也只有一个进程会在X-mode 持有lock,那就是LGWR,因为是只读库所以其它查询会话都是以S-mode parse查询。当在ADG环境看到library cache lock时通常发现在以下情况:primary site上做了DDL; 系统权限做了revoke. 而对于这个问题我们尝试重启日志应用进程都没有解决,无奈当时重启standby实例恢复,后分析确认在ADG环境中存在一个相关bug, 安装patch后就可以彻底解决,当然如果是primary 就是频繁的DDL,而导致standby只是争用这个patch可能无用。这里记录一下这个问题。

这类问题通常建议收集:

hanganalyze, systemstate dump, v$ash;

— hanganalye

Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (stdanbob.anbob1)
                   os id: 22931
              process id: 135, oracle@weejar1
              session id: 2479
        session serial #: 1571
    }
    is waiting for 'library cache lock' with wait info:
    {
                      p1: 'handle address'=0xc0000015bff7f678
                      p2: 'lock address'=0xc000001334cffad8
                      p3: '100*mode+namespace'=0x1004a0002
            time in wait: 21.353805 sec
           timeout after: 14 min 38 sec
                 wait id: 2
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.143108 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 0.001206 sec
                     wait id: 1               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000013 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 0               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (stdanbob.anbob1)
                   os id: 12817
              process id: 32, oracle@weejar1 (LGWR)
              session id: 11265
        session serial #: 1
    }
    which is waiting for 'rdbms ipc message' with wait info:
    {
                      p1: 'timeout'=0x12c
            time in wait: 2.488548 sec
      heur. time in wait: 8.508734 sec
           timeout after: 0.511452 sec
                 wait id: 586406071
                blocking: 7 sessions
            wait history:
              * time between current wait and wait #1: 0.000089 sec
              1.       event: 'rdbms ipc message'
                 time waited: 3.010034 sec
                     wait id: 586406070       p1: 'timeout'=0x12c
              * time between wait #1 and #2: 0.000065 sec
              2.       event: 'rdbms ipc message'
                 time waited: 3.009998 sec
                     wait id: 586406069       p1: 'timeout'=0x12c
              * time between wait #2 and #3: 0.000025 sec
              3.       event: 'rdbms ipc message'
                 time waited: 0.687558 sec
                     wait id: 586406068       p1: 'timeout'=0x44
    }

Chain 1 Signature: 'rdbms ipc message'<='library cache lock'

# SSD trace

Blockers
~~~~~~~~

        Above is a list of all the processes. If they are waiting for a resource
        then it will be given in square brackets. Below is a summary of the
        waited upon resources, together with the holder of that resource.
        Notes:
        ~~~~~
         o A process id of '???' implies that the holder was not found in the
           systemstate. (The holder may have released the resource before we
           dumped the state object tree of the blocking process).
         o Lines with 'Enqueue conversion' below can be ignored *unless* 
           other sessions are waiting on that resource too. For more, see 
           http://gbr30026.uk.oracle.com:81/Public/TOOLS/Ass.html#enqcnv)
         o You might see a process blocked on a mutex wait event but also 
           reported as holding the same mutex. You will need to check the 
           processstate dump as we might have been waiting at the start of the 
           process dump but have acquired it before the dump completed.

                    Resource Holder State
                 Mutex 13e4d    10: Blocker
                 Mutex 13e4d    39: Blocker
LOCK: Handle=c0000015bff7f678    32: waiting for 'rdbms ipc message'
      LOAD: c0000013bcdaf710    ??? Blocker
      LOAD: c0000013bcdab630   361: 361: is waiting for 32:
      LOAD: c0000013bcd70ac8   667: 667: is waiting for 32:
      LOAD: c0000013e2cfa070   918: 918: is waiting for 32:
      LOAD: c0000013beff2170   943: 943: is waiting for 32:
      LOAD: c0000013e2d60cd8   954: 954: is waiting for 32:
      LOAD: c0000013e2d60ef8   950: 950: is waiting for 32:
      LOAD: c0000013e2cfa290   704: 704: is waiting for 32:
      LOAD: c0000013e2e5e690   910: 910: is waiting for 32:
      LOAD: c0000013bde32768   650: 650: is waiting for 32:
      LOAD: c0000012fb399e90   988: 988: is waiting for 32:
      LOAD: c0000012fb6ef678    ??? Blocker
      LOAD: c0000013bcdf7880  1008: 1008: is waiting for 32:
      LOAD: c0000013bcdf7660  1014: 1014: is waiting for 32:

PID to SID Mapping
~~~~~~~~~~~~~~~~~~
Pid 10 maps to Sid(s): 3521
Pid 39 maps to Sid(s): 13729 13730
Pid 32 maps to Sid(s): 11265
Pid 361 maps to Sid(s): 14434
Pid 667 maps to Sid(s): 9522 9525
Pid 918 maps to Sid(s): 7770 7746
Pid 943 maps to Sid(s): 16554
Pid 954 maps to Sid(s): 20422
Pid 950 maps to Sid(s): 19019
Pid 704 maps to Sid(s): 27 1
Pid 910 maps to Sid(s): 4934 4953
Pid 650 maps to Sid(s): 3531 3532
Pid 988 maps to Sid(s): 9884
Pid 1008 maps to Sid(s): 16909
Pid 1014 maps to Sid(s): 19037

Warning: The following processes have multiple session state objects and
may not be properly represented above :
    42:   52:  273:                                                            

Object Names
~~~~~~~~~~~~
Mutex 13e4d                                                   
LOCK: Handle=c0000015bff7f678   CURSOR(00):SYS.anbob          
LOAD: c0000013bcdaf710                                        
LOAD: c0000013bcdab630                                        
LOAD: c0000013bcd70ac8                                        
LOAD: c0000013e2cfa070                                        
LOAD: c0000013beff2170                                        
LOAD: c0000013e2d60cd8                                        
LOAD: c0000013e2d60ef8                                        
LOAD: c0000013e2cfa290                                        
LOAD: c0000013e2e5e690                                        
LOAD: c0000013bde32768                                        
LOAD: c0000012fb399e90                                        
LOAD: c0000012fb6ef678                                        
LOAD: c0000013bcdf7880                                        
LOAD: c0000013bcdf7660                                        

Summary of Wait Events Seen (count>10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
  No wait events seen more than 10 times

  ----------------------------------------
  SO: 0xc0000016536b1cc8, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0xc0000016536b1cc8, name=process, file=ksu.h LINE:12616 ID:, pg=0
  (process) Oracle pid:32, ser:1, calls cur/top: 0xc0000016164719f8/0xc0000016164719f8
            flags : (0x6) SYSTEM
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 35
              last post received-location: ksr2.h LINE:627 ID:ksrpublish
              last process to post me: c0000016236326c0 1 6
              last post sent: 0 0 36
              last post sent-location: ksr2.h LINE:631 ID:ksrmdone
              last process posted by me: c0000016236326c0 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0xc000001653f68b68
    O/S info: user: oracle, term: UNKNOWN, ospid: 12817
    OSD pid info: Unix process pid: 12817, image: oracle@kdyya1 (LGWR)
    Short stack dump:
ksedsts()+544<-ksdxfstk()+48<-ksdxcb()+3216<-sspuser()+688<-<-_pw_wait()+48<-pw_wait()+112<-sskgpwwait()+432<-skgpwwait()+320<-ksliwat()+3328<-kslwaitctx()+304<-kslwait()+192<-ksarcv()+640<-ks
babs()+752<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80
    ----------------------------------------
	
    ----------------------------------------
    SO: 0xc0000016552ca578, type: 4, owner: 0xc0000016536b1cc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xc0000016536b1cc8, name=session, file=ksu.h LINE:12624 ID:, pg=0
    (session) sid: 11265 ser: 1 trans: 0x0000000000000000, creator: 0xc0000016536b1cc8
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: , short-term DID:
              txn branch: 0x0000000000000000
              oct: 0, prv: 0, sql: 0x0000000000000000, psql: 0x0000000000000000, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
     0: waiting for 'rdbms ipc message'
        timeout=0xe7, =0x0, =0x0
        wait_id=586406182 seq_num=64592 snap_id=1
        wait times: snap=0.027546 sec, exc=0.027546 sec, total=0.027546 sec
        wait times: max=2.310000 sec, heur=0.027546 sec
        wait counts: calls=1 os=1
        in_wait=1 iflags=0x5a8
    There are 1 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 20067, ser: 55
      wait event: 'library cache lock'
        p1: 'handle address'=0xc0000015bff7f678
        p2: 'lock address'=0xc0000013dad14ef8
        p3: '100*mode+namespace'=0x1004a0002
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 12 secs, waiter_cache_ver: 16092
    Wait State:
      fixed_waits=0 flags=0x22 boundary=0x0000000000000000/-1
    Session Wait History:

  
    ----------------------------------------
    SO: 0xc0000016164719f8, type: 3, owner: 0xc0000016536b1cc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xc0000016536b1cc8, name=call, file=ksu.h LINE:12620 ID:, pg=0
    (call) sess: cur c0000016552ca578, rec 0, usr c0000016552ca578; flg:20 fl2:1; depth:0
    svpt(xcb:0x0000000000000000 sptn:0x2 uba: 0x00000000.0000.00)
    ksudlc FALSE at location: 0
      ----------------------------------------
      SO: 0xc0000015cff543b8, type: 78, owner: 0xc0000016164719f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0xc0000016536b1cc8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8547 ID:, pg=0

      LibraryObjectLock:  Address=c0000015cff543b8 Handle=c0000015bff7f678 Mode=X CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

        User=c0000016552ca578 Session=c0000016552ca578 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=2
      LibraryHandle:  Address=c0000015bff7f678 Hash=d68d5950 LockMode=X PinMode=0 LoadLockMode=0 Status=0
        ObjectName:  Name=SYS.anbob
          FullHashValue=93b41a89a2321a9b51ff1c32d68d5950 Namespace=DBINSTANCE(74) Type=CURSOR(00) Identifier=1 OwnerIdn=0
        Statistics:  InvalidationCount=0 ExecutionCount=0 LoadCount=0 ActiveLocks=1 TotalLockCount=3424094 TotalPinCount=0
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=0 BucketInUse=7020 HandleInUse=7020 HandleReferenceCount=0
        Concurrency:  DependencyMutex=c0000015bff7f728(0, 0, 0, 0) Mutex=c0000015bff7f7a8(0, 7048705, 1949, 0)
        Flags=RON/PIN/KEP/BSO/[00810003]
        WaitersLists:
          Lock=c0000015bff7f708[c0000013dad14f68,c0000013dad14f68]
          Pin=c0000015bff7f6e8[c0000015bff7f6e8,c0000015bff7f6e8]
          LoadLock=c0000015bff7f760[c0000015bff7f760,c0000015bff7f760]
        Timestamp:
        HandleReference:  Address=c0000015bff7f818 Handle=0000000000000000 Flags=[00]

以上面这个信息不能在MOS中匹配到Bug 16717701 ,在它的超集bug 17018214也有记录,从SSD中可以看到LGWR holding “DBINSTANCE” namespace library cache lock in X-mode, 像MOS中提示的Cancel the media recovery and restart临时解决方法在我们这个例子是不可行的,但是有测试cancel后,flush 所有实例的shared pool再次应用日志有时可以解决。 彻底的解决方法是在standby db安装patch 17018214,注意安装patch后所有版本默认fix是Disable的,需要使用event 16717701使修复生效,level 是一个参数值,在readme中也有介绍该值的计算方法, 如果在没有安装oneoff patch 启用event 16717701不报错,但是同样不会有任何作用。

For backports/one-off patches, the fix must be enabled by setting event 16717701 at level 104887600:

alter system set EVENT='16717701 trace name context forever, level 104887600' scope=spfile;

The 104887600 value of the event encodes two things:
1. Timeout – this is the amount of time that LGWR will wait for an X lock before signaling a timeout error and retrying.
2. Sleep duration – this is the amount of time that LGWR will sleep for after having timed out.

Both of the above values are denoted in MILLISECONDS. The event value encodes the sleep duration in its 12high-order bits and the timeout in the 20 lower order bits. The value can be calculated using the following formula:

value = (S * 1048576) + T

where S = sleep duration in milliseconds
T = timeout in milliseconds

当然这个值可以进行微调,根据oracle的建议是开始配置30 second timeout 和 100 ms sleep duration,也就是说LGWR会等待30秒获取X-lock, 如时失败会在sleeping 100秒秒后重试,使用上面的计算方法,event的level值计算就是:

value = (100 * 1048576) + 30000 = 104887600

在 RAC ADG 环境中这个event必须在所有使用使用相同的值,而且不支持滚动修改。

这个bug在11.2.0.4 和 12c已修复,同时引入了隐藏参数控制 “_adg_parselock_timeout”和_adg_parselock_timeout_sleep .

SQL> @pd parselock
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM N_HEX NAME                               VALUE        DESCRIPTION
---------- ----- ---------------------------------- ------------ ------------------------------------------------------------------
      2345   929 _adg_parselock_timeout             0            timeout for parselock get on ADG in centiseconds
      2347   92B _adg_parselock_timeout_sleep       100          sleep duration after a parselock timeout on ADG in milliseconds

The value of 550 centiseconds has been typically enough (550*0.01=5.5 seconds): “_adg_parselock_timeout”=550;
it can be adjusted to the recommended value of 30 seconds; “_adg_parselock_timeout”= 3000.

 

Oracle 20C 关于Security安全的行为改变

$
0
0

Oracle 20-22c是12c后的下一个大版本集,也有了大的改变,如本身就云架构数据库定位从20c起不再支持非多租户,了解新版本可以了解数据库的发展方向,减少不必要的麻烦,如短时间内再次升级功能不再支持应用程序又要重写, 对于安全方面同样也有一些变化,这里简单的记录。

1,Deprecated传统审计Audit功能

在Oracle数据库中提供标准的传统审计已有20多年了, 从12c 开始引入Unified Audit,在Oracle数据库内部提供选择性和有效的审核, 提升了审计全面性扩展了datapump, export\import, rman等行为审计,统一了数据源UNIFIED_AUDIT_TRAIL(只读sysaudit表),不仅是扩展更加灵活引入了全新的架构,增加了内存区列队,提升了审计性能,12c是传统审计和统一审计共存的版本,做为过渡,从20c起将全部过渡到统一审计中来。

2, 弃用DBMS_CRYPTO中的较旧算法

从Oracle Database 20c开始,DBMS_CRYPTO不赞成使用其中包含的较早的加密和哈希算法。包括MD4,MD5和RC4相关算法,Oracle还建议您将MD4和MD5哈希算法更新为最新版本。

以前

3,弃用企业用户安全(EUS)用户迁移实用程序

Oracle数据库20c中不推荐使用企业用户安全性(EUS)用户迁移实用程序(UMU)。请改用EUS Manager(EUSM)功能。

4, 弃用TLS 1.0(传输层安全性)

从Oracle Database 20c开始,不赞成使用传输层安全协议版本1.0(TLS 1.0)。

根据安全最佳实践,Oracle已弃用TLS 1.0。为了满足安全性要求,Oracle强烈建议您改用TLS 1.2。

5,不再支持DBMS_OBFUSCATION_TOOLKIT软件包

从Oracle Database 20c开始不再支持该软件包 DBMS_OBFUSCATION_TOOLKIT,并将其替换为 DBMS_CRYPTO。

6, Desupported Oracle ACFS安全性(Vault)和ACFS审核

从Oracle Grid Infrastructure 20c开始,不支持Oracle ASM群集文件系统(ACFS)安全性(Vault)和ACFS审核。

为了管理安全性和审核,Oracle建议您使用操作系统访问控制和审核系统。

7, Desupported ACFS Encryption on Solaris and Windows

从Oracle Database 20c开始,Microsoft Windows不支持Oracle Grid Infrastructure功能自动存储管理群集文件系统(Oracle ACFS)

Oracle ACFS用于两个主要用例:

  • 适用于Oracle Real Application Clusters(Oracle RAC)的Oracle数据库文件
  • 需要在多个主机之间共享的通用文件(非结构化数据)。

对于Oracle Real Application Clusters文件,Oracle建议您使用Oracle ASM。对于通用文件,Oracle会根据您的用例,建议您将文件移动到Oracle数据库文件系统(DBFS),或将文件移动到Microsoft Windows共享文件。

8,取消与Oracle集群件的供应商集群件集成

 从Oracle Clusterware 20c开始,不支持将供应商或第三方集群软件与Oracle Clusterware集成。

9, Oracle Database 20c中不支持的参数

取消支持UNIFIED_AUDIT_SGA_QUEUE_SIZE

10,强制升级的密码文件区分大小写

从Oracle Database 20c开始,不支持orapwd文件的IGNORECASE参数。所有新创建的密码文件均区分大小写。

11,从单个客户端连接到具有不同证书的多个数据库

从Oracle Database 20c开始,您可以配置数据库客户端以使用不同的SSL证书维护多个安全套接字层(SSL)会话。

12,能够设置默认表空间加密算法
从Oracle Database 20c开始,您可以设置TABLESPACE_ENCRYPTION_DEFAULT_ALGORITHM动态参数来定义用于表空间创建操作的默认加密算法。支持的加密算法为AES128AES192AES2563DES168。如果未设置TABLESPACE_ENCRYPTION_DEFAULT_ALGORITHM,则默认加密算法为先前发行版中使用的默认加密算法:AES128

13, 改进了钱包和Oracle Key Vault中具有大量TDE密钥的钱包的性能

Oracle Database 20c引入了改进的透明数据加密(TDE)性能。这项增强功能可以在多租户数据库中更快地加载钱包并进行密钥轮换。它允许更快地执行TDE管理任务和PDB克隆操作。

14, blockchain table 区块链表

提供了一种基于对区块链的信认,而在数据库中提供的一种中心化的新类型only insert的表。

Oracle 20C新特性一:Blockchain Tables(区块链表)

 


Alert: Oracle 19c DDL “COMMENT on Table” sql cursor no invalidation(deferred invalidation增强)

$
0
0

在之前blog《Troubleshooting long wait “enq: US – contention” & “enq: IV – contention” after DDL, alert show “libcache interrupt action by LCK”》 记录过DDL 会导致和对象相关的所有SQL cursor invalidation无效,而引发一系列的SQL解析风暴问题,SQL 重新解析的几种情况如收集统计信息、DDL(alert or comment)、DCL( grant or revoke)、create index、shared_pool age out(sharepool size small、flush shared_pool, purge sql cursor)等,但是注意ORACLE的不同的版本行为也在变化,为了减少不比要的sql cursor invalidation,因为sql parse的对于繁忙的系统代码是巨大的. 但是对于SQL调优时,有时我们是希望做了一些改变后希望SQL再次解析生成更好的执行计划(maybe), 通常是comment DDL或grant select on xx to system等相对影响较小的操作。但是注意“COMMENT ON” DDL 在Oracle 19c中行为貌似又改变了(12c未改变,18c不确认),SQL CURSOR不再失效invalidation,这也正是我这篇主要描述的,因为这正是我经常让SQL hard parse force常用的。我没有找到相关官方的文档.

SQL cursor  deferred invalidation

在12c 版本以前在dbms_stats中有no_invalidate选项,可以控制在统计信息相关改变后SQL CURSOR是否立即失效还在指定的时间窗口内滚动失效,滚动失效的窗口是有参数“_optimizer_invalidation_period”,单位秒,默认5小时。改变该参数使用:

SQL> alter system set "_optimizer_invalidation_period"=15; --15 second
System altered.

12c 又引入了DDL deferred invalidation 特性,在一些DDL 时可以指定“deferred invalidation”选荐,另外还有系统参数cursor_invalidation控制,默认为immediate,可以修改为deferred实现相同的效果。

对于SQL是否即将滚动失效,12C 在 V$SQL列引入了IS_ROLLING_INVALID列,值为Y\N\X, 意思应该为”是\否\滚动窗口开始”

DEMO

下面演示一下,版本 Oracle 19.2

SQL> create table testinv(id int,name varchar2(10));

SQL> Select * from testinv; --repeat 4 times

SQL> select sql_id,sql_text,child_number,invalidations,loads,parse_calls,executions,is_rolling_invalid from v$sql where sql_text like 'S%testinv%' order by sql_text;

SQL_ID        SQL_TEXT                       CHILD_NUMBER INVALIDATIONS      LOADS PARSE_CALLS EXECUTIONS I
------------- ------------------------------ ------------ ------------- ---------- ----------- ---------- -
481ahkqp95agh Select * from testinv                     0             0          1           4          4 N

SQL> comment on table testinv is 'test comment cursor invalidations';
Comment created.

SQL> Select * from testinv; --repeat 2 times

SQL> select sql_id,sql_text,child_number,invalidations,loads,parse_calls,executions,is_rolling_invalid from v$sql where sql_text like 'S%testinv%' order by sql_text;

SQL_ID        SQL_TEXT                       CHILD_NUMBER INVALIDATIONS      LOADS PARSE_CALLS EXECUTIONS I
------------- ------------------------------ ------------ ------------- ---------- ----------- ---------- -
481ahkqp95agh Select * from testinv                     0             0          1           6          6 N

SQL> Select id from testinv; --repeat 3 times 

SQL> select sql_id,sql_text,child_number,invalidations,loads,parse_calls,executions,is_rolling_invalid from v$sql where sql_text like 'S%testinv%' order by sql_text;

SQL_ID        SQL_TEXT                       CHILD_NUMBER INVALIDATIONS      LOADS PARSE_CALLS EXECUTIONS I
------------- ------------------------------ ------------ ------------- ---------- ----------- ---------- -
481ahkqp95agh Select * from testinv                     0             2          2           1          1 N
5pwupv8srwh0b Select id from testinv                    0             1          2           3          3 N

SQL> alter table testinv add c2 int  deferred invalidation;
Table altered.

SQL> Select id from testinv;--repeat 2 times

SQL> select sql_id,sql_text,child_number,invalidations,loads,parse_calls,executions,is_rolling_invalid from v$sql where sql_text like 'S%testinv%' order by sql_text;

SQL_ID        SQL_TEXT                       CHILD_NUMBER INVALIDATIONS      LOADS PARSE_CALLS EXECUTIONS I
------------- ------------------------------ ------------ ------------- ---------- ----------- ---------- -
481ahkqp95agh Select * from testinv                     0             2          2           1          1 N
5pwupv8srwh0b Select id from testinv                    0             2          3           2          2 N

SQL> create index idx_testinv_c1 on testinv(c1)  deferred invalidation;
Index created.

SQL> Select id from testinv;--repeat 2 times

SQL> select sql_id,sql_text,child_number,invalidations,loads,parse_calls,executions,is_rolling_invalid from v$sql where sql_text like 'S%testinv%' order by sql_text;

SQL_ID        SQL_TEXT                       CHILD_NUMBER INVALIDATIONS      LOADS PARSE_CALLS EXECUTIONS I
------------- ------------------------------ ------------ ------------- ---------- ----------- ---------- -
481ahkqp95agh Select * from testinv                     0             2          2           1          1 N
5pwupv8srwh0b Select id from testinv                    0             2          3           4          4 X

Alert: oracle 12\18\19\20c 不要滥用“_ORACLE_SCRIPT”=true

$
0
0

“_ORACLE_SCRIPT”参数首先是个隐藏参数,所以很少有文档中描述他打开了哪些开关,因为它是oracle内部维护时使用,在ORACLE_HOME下的脚本中不少都有alter session set “_oracle_script”=ture的SQL, 但是注意执行完后即使的再改回false. 千万不要为了突破oracle的默认限制而随意使用_oracle_script参数,生产库除了oracle要求更不建议修改,因为后期有可能会遇到不些不必要的麻烦。

修改了”_oracle_script”有可能在后面升级时因为破坏了oracle的内部约束而升级失败,或后面在业务数据导出(datapump)时而数据丢失。

因为在set “_oracle_script”=ture后,创建的用户,用户属性oracle_maintained会标记为Y。我在之前blog 《Oracle 12c新特性:ORACLE自动维护的Schema或默认创建的USER》中有记录过oracle_maintained是标记是否为ORACLE 内部SCHEMA。 同时如果”_oracle_script”=ture,那创建的对象oracle_maintained属性同样为Y, 这样在使用数据泵expdp导出时,认为是系统对象而不再导出,导致数据丢失。

demo
版本oracle 19.2

SQL> show con_name

CON_NAME
------------------------------
CDB$ROOT

SQL> create user anbob identified by oracle;
--fail show ORA-65096

SQL> ho oerr ora 65096
65096, 00000, "invalid common user or role name"
// *Cause:  An attempt was made to create a common user or role with a name
//          that was not valid for common users or roles. In addition to the
//          usual rules for user and role names, common user and role names
//          must consist only of ASCII characters, and must contain the prefix
//          specified in common_user_prefix parameter.
// *Action: Specify a valid common user or role name.
//


SQL> show parameter common_user
PARAMETER_NAME                                               TYPE        VALUE
------------------------------------------------------------ ----------- ----------------------------------------------------------------------------------------------------
common_user_prefix                                           string      C##
SQL> alter session set "_oracle_script"=true;
Session altered.

SQL> create user anbob identified by oracle;
User created.

SQL> alter session set "_oracle_script"=false;
Session altered.
 
SQL> col username for a10
SQL> select username,account_status,oracle_maintained from dba_users where username='ANBOB';
USERNAME   ACCOUNT_STATUS                   O
---------- -------------------------------- -
ANBOB      OPEN                             Y

Note:
突破cdb中对于common user要求以common_user_prefix 开头的限制,但是用户oracle_maintained =Y.

SQL> alter user anbob quota unlimited on users;
User altered.

SQL> create table anbob.test as select 1 id from dual;
Table created.
 
SQL> COL OBJECT_NAME FOR A30
SQL> select object_name,object_type,ORACLE_MAINTAINED from dba_objects where owner='ANBOB';

OBJECT_NAME                    OBJECT_TYPE             O
------------------------------ ----------------------- -
TEST                           TABLE                   N

SQL> alter session set "_oracle_script"=true;
Session altered.

SQL> create table anbob.test1 as select 1 id from dual;
Table created.

SQL> alter session set "_oracle_script"=false;
Session altered.

SQL> select object_name,object_type,ORACLE_MAINTAINED from dba_objects where owner='ANBOB';

OBJECT_NAME                    OBJECT_TYPE             O
------------------------------ ----------------------- -
TEST                           TABLE                   N
TEST1                          TABLE                   Y

note:
上面在cdb中创建的common user anbob 创建的table默认ORACLE_MAINTAINED =N,但是在 “_oracle_script”=true后创建的表ORACLE_MAINTAINED =Y, 另外测试在”_oracle_script”=true下,alter table add column 不会修改ORACLE_MAINTAINED值。

[oracle@anbob19 admin]$ exp anbob/oracle file=anbob.dmp

Export: Release 19.0.0.0.0 - Production on Sat Mar 7 13:34:27 2020
Version 19.2.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.


Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.2.0.0.0
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user ANBOB
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user ANBOB
About to export ANBOB's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export ANBOB's tables via Conventional Path ...
. . exporting table                           TEST          1 rows exported
EXP-00091: Exporting questionable statistics.
. . exporting table                          TEST1          1 rows exported
EXP-00091: Exporting questionable statistics.
...


[oracle@anbob19 admin]$ expdp \'\/ as sysdba\' directory=oracle_base schemas=anbob dumpfile=anbob.dump

Export: Release 19.0.0.0.0 - Production on Sat Mar 7 13:39:38 2020
Version 19.2.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.
Password:

Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Warning: Oracle Data Pump operations are not typically needed when connected to the root or seed of a container database.

Starting "SYS"."SYS_EXPORT_SCHEMA_01":  "/******** AS SYSDBA" directory=oracle_base schemas=anbob dumpfile=anbob.dump
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
Processing object type SCHEMA_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Processing object type SCHEMA_EXPORT/STATISTICS/MARKER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/TABLESPACE_QUOTA
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
. . exported "ANBOB"."TEST"                              5.046 KB       1 rows
Master table "SYS"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded
******************************************************************************
Dump file set for SYS.SYS_EXPORT_SCHEMA_01 is:
  /u01/app/oracle/anbob.dump
Job "SYS"."SYS_EXPORT_SCHEMA_01" successfully completed at Sat Mar 7 13:42:13 2020 elapsed 0 00:02:15

NOTE:
exp 可以导出common user的所有表, expdp未导到ORACLE_MAINTAINED =Y的表。

SQL> drop user anbob cascade;
drop user anbob cascade
*
ERROR at line 1:
ORA-28014: cannot drop administrative user or role

SQL> ho oerr ora 28014
28014, 00000, "cannot drop administrative user or role"
// *Cause:    An attempt was made to drop an administrative user or role.
//            An administrative user or role can be dropped only by SYS during
//            migration mode.
// *Action:   Drop the administrative user or role during migration mode.
//

SQL> alter session set "_oracle_script"=true;
Session altered.

SQL> drop user anbob cascade;
User dropped.

SQL> alter session set "_oracle_script"=false;
Session altered.

Note:
对于CDB用户下创建的common user删除是提示ORA-28014, 需要set “_oracle_script”=true删除。

Troubleshooting performance event ‘enq: CF – contention’

$
0
0

CF enqueues are control file enqueues, which occur during parallel access to the control file,it is a system enqueues  generally only held for a very short time. the CF locks are used to serialize controlfile transactions and read and writes on shared portions of the controlfile.Control file enqueue is an expensive operation. For operations like updating checkpoint SCNs for datafiles, for no-logging operations, such enqueue on controlfile is taken.

P1: name|mode   e.g. 43460005 ,  4346 ASCII is CF , mode 5 is SSX

P2:  correspond to 0.

In which modes can I hold or request an enqueue?

  • 0: Enqueue is not held or requested
  • 1: null (NULL)
  • 2: row-S (SS)
  • 3: row-X (SX)
  • 4: share (S)
  • 5: S/Row-X (SSX)
  • 6: exclusive (X)

The CF enqueue can be seen during any action that requires reading the control file, Typically CF locks are allocated for a very brief time and used when:

.checkpointing
.switching logfiles
.archiving redologs
.performing crash recovery
.logfile manipulation
.begin/end hot backup
.DML access for NOLOGGING objects

Diag Method

1, Hanganalye

2, Systemstate dump
Search for ” ‘enq: CF – contention’ ” in the systemstate dump to identify the hold process, which call/top? what’s enqueue? hold mode?

SO: 0x2341a50718, type: 8, owner: 0x2080dec768, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
         proc=0x23012ace68, name=enqueue, file=ksq1.h LINE:380 ID:, pg=0
        (enqueue) CF-00000000-00000000  DID: 0001-015C-00009318
        lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x3
        mode: SSX, lock_flag: 0x10, lock: 0x2341a50770, res: 0x23319cb488
        own: 0x230176cc20, sess: 0x230176cc20, proc: 0x23012ace68, prv: 0x23419b9838
        slk: 0x23128d3d90

3, create a table for v$active_session_hisotry

4, find the waitchain(Tanelpoder’s scripts tools)

SQL> @ash/ash_wait_chains program2||event2 1=1 sysdate-1/24/12 sysdate

5, find the holder

select l.sid, p.program, p.pid, p.spid, s.username, s.terminal, s.module, s.action, s.event, s.wait_time, s.seconds_in_wait, s.state
from v$lock l, v$session s, v$process p
where l.sid = s.sid
and s.paddr = p.addr
and l.type='CF'
and l.lmode >= 5;

1. If you see the holder is: background process, typically LGWR, CKPT or ARCn
the holder is holding the enqueue for a longer period of time
Check if the redologs are sized adequately. Typically you want to drive at a log switch every 30 minutes. Also verify checkpointing parameters such as fast_start_mttr_target.

2. If you see the holder is: a user session (so no background process)
the holder is constantly changing the wait event of the holder is ‘control file parallel write’
Then it is most likely that the contention for the CF enqueue is caused by DML on a NOLOGGING object.
sulution to enable LOGGING or Set event 10359 to level 1 to skip updating the unrecoverable SCN’s in the control file.

NOTE! Be aware of bug 12360160 which affects 11.2.0.2 that setting 10359 has no effect if DB_UNRECOVERABLE_SCN_TRACKING is set to false at startup

3. Check if the archive destination (log_archive_dest_n) are accessible, you may need to involve System/Storage admins.
If you are using NFS filesystem for the archive destinations then make sure there is no issue with nfs as this can lead to log switch hanging and that leads to CF enqueue as the lock holder will be either LGWR or ARCn processes

The RMAN snapshot control file is stored in an NFS file system,The access to NFS file system is extremely slow which provokes that control file will be locked for so much time and causes the instance failure due to not being able to get control file enqueue. When RMAN is started,the db instance maybe hang.

4. If you see the holder is: MMON, M000
the holder is ‘control file parallel write’ , when flashback database enable, sometimes due to a lot of flashback logs in FRA, the MMON try to monitor FRA usage while hold the CF lock a long time. alse query v$flash_recovery_area_usage or v$recovery_file_dest take a amount time to complete. solution to clean flasback log.
Rman feature usage tracking causes CF enqueue contention. M000 process holds CF for a long time. It is likely to show kewfeus_execute_usage_sampling on the stack with krbc_rman_usage also present .check bug 16485447

5, If you see the holder is : RMAN
The RMAN process reads the control file sequentially for its operation for long time and thus the CF enqueue resource has been held up in Shared mode and not yet released. You can kill the rman process. There are many cases causing the CF enqueue contention and this is one of the real time examples having the CF enqueue blocked by a RMAN process.IO could be slow as the RMAN process has to wait for long time to read the control file.

6, The Control file itself can be large enough containing huge no.of records history and this would lead to the processes accessing the control file take long time to finish.

The view V$CONTROLFILE_RECORD_SECTION can be queried and check for the CF record section having huge RECORD_SIZE and try to reduce the size it has occupied.Check for CONTROL_FILE_RECORD_KEEP_TIME parameter and make sure it has not been set to large value. By default, its 7.
In such cases of control file becoming huge, controlfile needs to be recreated as part of the fix.

7, When the CF enqueue has been heldup for more than 900 seconds and not released, the following error appears in the Database alert_log.
ORA-00494: enqueue [CF] held for too long (more than 900 seconds.This may be accompanied by the instance crash due to CF enqueue contention for long time.

_controlfile_enqueue_timeout: This parameter is used to set the timeout limit for CF enqueue. By default, it is 900 seconds.
_kill_controlfile_enqueue_blocker:This parameter enables killing controlfile enqueue blocker on timeout.TRUE. Default value. if the holder is a background process, for example the LGWR, the kill of the holder leads to instance crash. set FALSE to Disable this mechanism.
_kill_enqueue_blocker=1 default is 2,  set value =1 , if the enqueue holder is a background process, then it will not be killed, therefore the instance will not crash. If the enqueue holder is not a background process, the new 10.2.0.4 mechanism will still try to kill it.

8, From LGWR tracefile messages, you can interpret that your I/O subsystem might not be performing as expected. For every lgwr write to disk which takes more than 500ms, a warning is written to the lgwr tracefile.

Alert: Patch 28553832(11g R2 Extended Support patch) need apply upgrade to 19c

$
0
0

Direct Upgrade to Oracle Database 19c的版本有11.2.0.4,12.1.0.2,12.2.0.1,18c. 在最近11G (11204)升级19C的方案测试时遇到了上面的错误, 是不是很惊喜?在GI升级19c时需要检查patch 28553832是否安装,而且不允许跳过。这个从Patches to apply before upgrading Oracle GI and DB to 19c or downgrading to previous release (Doc ID 2539751.1) 可以确认。

如果你的版本是以下:

–18.6.0 (April 2019) or newer
–12.2.0.1.190416 (April 2019) or newer
–12.1.0.2.190416 (April 2019) or newer
–11.2.0.4.191015 (October 2019) or newer

你不需要安装该补丁,版本中已包含. 这个patch默认已包含在19.1中。如果你的版本低于上面的要求,都需要在升级前应用patch 28553832。

那么看看patch 28553832解决的是什么问题?

Bug 28553832 Process Failed to Run in Real-time priority If Docker Engine is Installed

如果安装了Docker Engine RPM,OCSSD集群件进程会因为无法启动到RT优先级而提示启动失败信息,在GI alert可能会报告错误:
CRS-1726: Process failed to run in real-time priority.

判断docker是否安装?

# rpm -qa|grep docker

解决方法:

卸载Docker Engine RPM或应用补丁28553832。

再回来看我们这个案例,这个环境中是没有安装Docker Engine RPM的, 但是依旧提示,显然不是合逻辑的。

那么是否可以下载patch 28553832?

当在MOS中下载patch 288553832 for 11.2.0.4时发现全是Software Extended Support patches,表示没有购买ES服务是无权下载的, 这也是建议在软件服务周期内升级的原因。 这个oneoff patch最早是在 11.2.0.4.GIPSU.191015中,这个PSU同样也是Software Extended Support patches。 那我们开始直接安装19C GI,而不是从11 G GI升级是可以绕过,但是! Note 2539751.1 记录了GI 和DB 同样升级时都需要该补丁。

那么这样就让11G R2直接升到19c 变的不那么容易了,尤其是想通过DG方式从11.2.0.4升级19C的方案。

Alert: SEC_CASE_SENSITIVE_LOGON and ORA-1017 after upgrade to 12.2 、18c、19c

$
0
0

从oracle 12c R2开始SEC_CASE_SENSITIVE_LOGON=FALSE的配置会认为deprecated ,12.2默认参数值为TRUE.  如果升级后该参数还是从以前版本带到新版本中,那就使情况变的复杂,有可能会遇到ora-1017密码错误的提示, 在配置静态监听时就更有意思。

当升级12c的版本以后如果遇到ORA-1017密码错误,如果密码在未动且正确,或修改了密码后还提示密码错误,首先要检查的是dba_user.password_versions和DB home sqlnet.ora参数中的SQLNET.ALLOWED_LOGON_VERSION_SERVER和SQLNET.ALLOWED_LOGON_VERSION_CLIENT。

PASSWORD_VERSIONS在12.2之前如12.1 有可能是10G 11G 或10G 11G 12C, 并且文件sqlnet.ora文件中SQLNET.ALLOWED_LOGON_VERSION_SERVER默认已成为12,如果是升级后默认password_verison不是会改变,无论是expdp还是RMAN,但是手动新建的用户的password_version会是11G 12C. 如果SQLNET.ALLOWED_LOGON_VERSION_SERVER=12a, 那手动创建和修改后的password_verison就只有12c,但是此时11.2.0.4以下的客户端登录就会提示:

ORA-28040: No matching authentication protocol

所以sqlnet.ora中的ALLOWED_LOGON_VERSION_SERVER的变化影响alter user 修改密码后的password_version.

在12C r2版本的数据库中如果修改SEC_CASE_SENSITIVE_LOGON=FALSE,如果用户是从11g 升上来的并且password_verison为10g 11g, 那连接是就会提示:

ORA-01017: invalid username/password; logon denied

即使使用alter user现在修改用户的密码,从dba_user.password_versions确认已经为”11G 12C”,用户也无法登录还是会提示ora-1017, 因为该SEC_CASE_SENSITIVE_LOGON=FALSE设置与默认的SQL*Net身份验证协议(SQLNET.ALLOWED_LOGON_VERSION_SERVER=12)冲突, 最简单的解决方法就是修改SEC_CASE_SENSITIVE_LOGON=TRUE

当ALLOWED_LOGON_VERSION_SERVER=12 or 12a时,要确保SEC_CASE_SENSITIVE_LOGON不能配置为false. 在值为12 或 12a时,只支持密码区分大小这种更安全的密码验证,所以如果SEC_CASE_SENSITIVE_LOGON=false会导致所有用户无法登录,除了sysdba。

  • Oracle Database 12.1: SQLNET.ALLOWED_LOGON_VERSION_SERVER defaults to 11
  • Oracle Database 12.2 and later: SQLNET.ALLOWED_LOGON_VERSION_SERVER defaults to 12

如何解决ORA-1017 在12C r2后的版本中 ?

1,SEC_CASE_SENSITIVE_LOGON=TRUE

像上面说的配置SEC_CASE_SENSITIVE_LOGON=TRUE,这也是ORACLE趋势性的建议,为了更加安全,应该把该参数配置为TRUE,并且修改sqlnet.ora中的ALLOWED_LOGON_VERSION_SERVER=12 or 12a,然后重置密码,把password_versions中的10g 11g 删掉。但是低于12c版本的降无法访问。

2,SQLNET.ALLOWED_LOGON_VERSION_SERVER

在sqlnet.ora文件中增加SQLNET.ALLOWED_LOGON_VERSION_SERVER低于12,重启监听, 如果用户当前的password_version没有10g 还需要重置密码。

如果客户端使用Oracle Database 10 g,则客户端将收到ORA-03134: Connections to this server version are no longer supported错误消息。如果连接,请将SQLNET.ALLOWED_LOGON_VERSION_SERVER值设置为8。确保DBA_USERS.PASSWORD_VERSIONS帐户的值包含该值10G。可能需要重置该帐户的密码。

更多参考 Database Net Services Reference

3, 在12c 中配置static listener 非1521 port

另外发现如果数据库中配置SEC_CASE_SENSITIVE_LOGON=false,  在12C R2配置多个监听在非1521时,配置静态监听,local_listener中不包含该Port, 即使password_version包含10g, SQLNET.ALLOWED_LOGON_VERSION_SERVER低于12,登录也是提示ORA-1017.

这中情况下把SEC_CASE_SENSITIVE_LOGON=true或把 静态监听的port 加入到local_listener中都是可以解决的。猜测是监听程序默认是以12.2 默认行为,在没有配置local_listener时,listener不知道db的参数配置。

 

升级12C注意事项: 连接失败 ORA-28040 ORA-1017

Oracle 12c 关于密码(password)的几个新特性小结

相关一些MOS

12.2: ORA-28040 Followed by ORA-1017 When Client is Under Version 12. ( Doc ID 2296947.1 )
The new Exclusive Mode default for password-based authentication in Oracle 12.2 conflicts with case-insensitive password configurations. All user login fails with ORA-1017 after upgrade to 12.2 ( Doc ID 2075401.1 )
12c Database Alert.log File Shows The Message: Using Deprecated SQLNET.ALLOWED_LOGON_VERSION Parameter ( Doc ID 2111876.1 )

Viewing all 701 articles
Browse latest View live