Oracle 希望在数据库主机CPU使用率枯竭时,尽可能让核心的几个后台进程可以最大优先级获取CPU, 虽然CPU过高会导致I/O 响应时间变长和网络延迟增加,也会间接影响数据的整体性能。从oracle 10g开始是有隐藏参数_high_priority_processes控制哪些参数是更高优先级,在19c参数_highest_priority_processes控制最高优先级。在 10.2 版本中,Oracle缺省的对RAC的核心进程LMS* 设置高优先级,在11g版本对 LMS*||VKTM 设置高优先级,在19C版本对VKTM是最高优先级,提供了更多的进程对LMS*|LM*|LCK0|GCR*|CKPT|DBRM|RMS0|LGWR|CR*|RMV* 配置更高优先级 。记的在10.2.0.3前好像有个bug 会导致进程过高使用CPU,目前的主流版本问题较少。 最近有客户遇到CPU 使用率超过90%时GC问题较为突出,在查看LMS时发现没有在RT模式引起了注意,在19c中 LMS还是有一些变化,下面简单的记录。
在Linux平台上,进程的内核调用模式分为三类:
TS – SCHED_OTHER (SCHED_NORMAL) ,这是分时调度策略,缺省的通用级别;
FF – SCHED_FIFO,这是实时调度策略,先进先出;
RR – SCHED_RR,实时调度策略,时间片轮转;
先看一个正常环境 oracle 19c RAC 2-nodes on RHEL 7.8
# db alert log Starting background process CLMN CLMN started with pid=3, OS id=28714 Starting background process PSP0 PSP0 started with pid=4, OS id=28731 Starting background process IPC0 2021-03-23 10:07:32.440000 +08:00 IPC0 started with pid=5, OS id=29420 Starting background process VKTM Starting background process GEN0 VKTM started with pid=6, OS id=29445 at elevated (RT) priority VKTM running at (1)millisec precision with DBRM quantum (100)ms Starting background process MMAN Starting background process LMD1 LMD0 started with pid=23, OS id=29631 * Load Monitor used for high load check * New Low - High Load Threshold Range = [130560 - 174080] LMS1 started with pid=26, OS id=29640_29663 at elevated (RT) priority LMS0 started with pid=24, OS id=29635_29662 at elevated (RT) priority LMS2 started with pid=28, OS id=29646_29666 at elevated (RT) priority Starting background process LMD2 LMD1 started with pid=36, OS id=29659 LMS3 started with pid=30, OS id=29649_29667 at elevated (RT) priority LMS4 started with pid=32, OS id=29651_29672 at elevated (RT) priority LMS5 started with pid=34, OS id=29653_29677 at elevated (RT) priority Starting background process LMD3 LMD2 started with pid=37, OS id=29681 LMD3 started with pid=38, OS id=29686 Starting background process RMS0 RMS0 started with pid=39, OS id=29689 oracle@anbob_com:/home/oracle> ps -efc|grep vktm grid 34874 1 RR 41 Jun03 ? 00:06:20 asm_vktm_+ASM1 oracle 42358 1 RR 41 Jun03 ? 00:05:24 ora_vktm_anbob1 grid 58462 1 RR 41 Jun03 ? 00:06:18 mdb_vktm_-MGMTDB Note: vktm 是RR mode. oracle@anbob_com:/home/oracle> ps -efc|grep lms oracle 35148 90946 TS 19 16:02 pts/3 00:00:00 grep --color=auto lms oracle 66573 1 TS 19 May21 ? 04:32:32 ora_lms0_anbob1 oracle 66576 1 TS 19 May21 ? 04:29:41 ora_lms1_anbob1 oracle 66578 1 TS 19 May21 ? 04:26:33 ora_lms2_anbob1 oracle 66581 1 TS 19 May21 ? 04:26:51 ora_lms3_anbob1 oracle 66586 1 TS 19 May21 ? 04:25:38 ora_lms4_anbob1 oracle 66589 1 TS 19 May21 ? 04:28:44 ora_lms5_anbob1 oracle 66596 1 TS 19 May21 ? 04:25:44 ora_lms6_anbob1 oracle 66599 1 TS 19 May21 ? 04:50:02 ora_lms7_anbob1 oracle 66603 1 TS 19 May21 ? 04:22:42 ora_lms8_anbob1 oracle 66609 1 TS 19 May21 ? 04:21:31 ora_lms9_anbob1 oracle 66615 1 TS 19 May21 ? 04:25:41 ora_lmsa_anbob1 oracle 66620 1 TS 19 May21 ? 04:29:43 ora_lmsb_anbob1 grid 129022 1 TS 19 May14 ? 00:36:49 asm_lms0_+ASM1
Note:
但lms 还是TS Mode. 在12C 版本及之前也PS是显示RR mode。
# sqlplus -V SQL*Plus: Release 12.2.0.1.0 Production # ps -eLfc |head -n 1;ps -eLfc|grep lms UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD grid 14661 1 14661 1 RR 41 2019 ? 1-08:14:40 asm_lms0_+ASM1 oracle 62106 1 62106 1 RR 41 2019 ? 17-22:45:22 ora_lms0_weejar1 oracle 62109 1 62109 1 RR 41 2019 ? 18-10:30:26 ora_lms1_weejar1 oracle 62111 1 62111 1 RR 41 2019 ? 18-00:13:16 ora_lms2_weejar1 oracle 62113 1 62113 1 RR 41 2019 ? 17-22:02:20 ora_lms3_weejar1 oracle 62115 1 62115 1 RR 41 2019 ? 17-22:07:53 ora_lms4_weejar1
# 检查oradism文件
oracle@anbob_com:/home/oracle> ls -l $ORACLE_HOME/bin/oradism
-rwsr-x--- 1 root oinstall 147848 Apr 17 2019 /oracle/app/oracle/product/19c/db_1/bin/oradism
正常
Note:
For 10gR2 and 11gR1 installations, verify that the oradism executable matches the following ownership and permissions “-rwsr-sr-x 1 root dba oradism” and make sure the lms is running in Real Time mode.
# 检查ORACLE_HOME文件系统挂载点
oracle@anbob_com:/home/oracle> cat /proc/mounts|grep oracle /dev/mapper/fusioncube-oracle /oracle ext4 rw,relatime,stripe=16,data=ordered 0 0 正常
# AWR中LMS
RAC Statistics
Begin | End | ||
---|---|---|---|
Number of Instances: | 2 | 2 | |
Number of LMS’s: | 12 | 12 | |
Number of realtime LMS’s: | 12 | 12 | (0 priority changes) |
# 检查后台进程
SQL> select 'LMS', INST_ID,PRIORITY,COUNT(*) TOTAL FROM GV$BGPROCESS where name like 'LMS%' GROUP BY INST_ID,PRIORITY ; 'LMS' INST_ID PRIORITY TOTAL ------ ---------- ---------------- ---------- LMS 1 RT 12 LMS 2 RT 12
种种显示当前LMS进程是RT模式,但PS显示进程还是TS,难道是显示问题?还是ORACLE有新特性改变? 是的, 从18c开始LMS进程改为线程模式。
oracle@anbob_com:/home/oracle> ps -eLfc |head -n 1;ps -eLfc|grep lms UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD oracle 66573 1 66573 4 TS 19 May21 ? 00:00:08 ora_lms0_anbob1 oracle 66573 1 66580 4 RR 41 May21 ? 03:15:29 ora_lms0_anbob1 oracle 66573 1 67219 4 TS 19 May21 ? 00:23:08 ora_lms0_anbob1 oracle 66573 1 67240 4 TS 19 May21 ? 00:53:41 ora_lms0_anbob1 oracle 66576 1 66576 4 TS 19 May21 ? 00:00:08 ora_lms1_anbob1 oracle 66576 1 66582 4 RR 41 May21 ? 03:12:36 ora_lms1_anbob1 oracle 66576 1 67270 4 TS 19 May21 ? 00:23:09 ora_lms1_anbob1 oracle 66576 1 67301 4 TS 19 May21 ? 00:53:43 ora_lms1_anbob1 oracle 66578 1 66578 4 TS 19 May21 ? 00:00:08 ora_lms2_anbob1 oracle 66578 1 66591 4 RR 41 May21 ? 03:10:10 ora_lms2_anbob1 oracle 66578 1 67339 4 TS 19 May21 ? 00:22:52 ora_lms2_anbob1 ...
ok.
再看另一个问题环境Oracle 19.4 2-nodes RAC on RHEL 7.5
RAC Statistics
Begin | End | ||
---|---|---|---|
Number of Instances: | 2 | 2 | |
Number of LMS’s: | 40 | 40 | |
Number of realtime LMS’s: | 0 | 0 | (0 priority changes) |
SQL> select * from v$bgprocess where name like 'LMS%'; PADDR PSERIAL# NAME DESCRIPTION PRIORITY CON_ID ---------------- ---------- ----- -------------------------------- -------- ---------- 0000001E01B628A0 1 LMS0 global cache service process TS 0 0000001E01B65360 1 LMS7 global cache service process TS 0 0000001E01B67E20 1 LMSE global cache service process TS 0 0000001E01B6A8E0 1 LMSL global cache service process TS 0 0000001E01B6D3A0 1 LMSS global cache service process TS 0 0000001E01B6FE60 1 LMSZ global cache service process TS 0 0000001E21AC8498 1 LMS3 global cache service process TS 0 0000001E21ACAF58 1 LMSA global cache service process TS 0 0000001E21ACDA18 1 LMSH global cache service process TS 0 0000001E21AD04D8 1 LMSO global cache service process TS 0 0000001E21AD2F98 1 LMSV global cache service process TS 0 0000001E41A66B58 1 LMS6 global cache service process TS 0 ...
# db alert log
2021-06-03T10:50:19.500768+08:00 LMON started with pid=22, OS id=98747 Starting background process LMD0 2021-06-03T10:50:19.527437+08:00 LMD0 started with pid=23, OS id=98749 Starting background process LMD1 2021-06-03T10:50:19.528918+08:00 * Load Monitor used for high load check * New Low - High Load Threshold Range = [230400 - 307200] 2021-06-03T10:50:19.703222+08:00 Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lms0_98751_98758.trc (incident=873064): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMS0], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] Incident details in: /u01/oracle/diag/rdbms/anbob1/anbob11/incident/incdir_873064/anbob11_lms0_98751_98758_i873064.trc 2021-06-03T10:50:19.711460+08:00 Error attempting to elevate LMS0's priority: no further priority changes will be attempted for this process LMS0 started with pid=24, OS id=98751_98758 2021-06-03T10:50:19.800751+08:00 Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsd_98808_98825.trc (incident=873065): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSD], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] 2021-06-03T10:50:19.815049+08:00 Error attempting to elevate LMSD's priority: no further priority changes will be attempted for this process LMSD started with pid=50, OS id=98808_98825 2021-06-03T10:50:19.924836+08:00 LMD1 started with pid=104, OS id=98950 2021-06-03T10:50:19.924929+08:00 Starting background process LMD2 2021-06-03T10:50:19.944617+08:00 Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsb_98797_98815.trc (incident=873066): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSB], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] 2021-06-03T10:50:19.945838+08:00 Error attempting to elevate LMSB's priority: no further priority changes will be attempted for this process Starting background process LMD3 2021-06-03T10:50:19.949748+08:00
Note:
这套环境的LMS进程运行在TS 模式,是因为在实例启动时就遇到了ORA-800错误[Set Priority Failed]失败了。
#检查oradism
oracle@anbob1a:/home/oracle/scripts_oracle$ ls -l $ORACLE_HOME/bin/oradism -rwxr-x--- 1 oracle oinstall 147848 Apr 17 2019 /u01/oracle/product/bin/oradism
对于这个环境的owner 和权限都是错的, 修正后重启实例就可以解决。
也可以使用root用户使用chrt在线修改进程级为RR mode
# chrt -r -p 1 [lms pid]