[Oracle] cursor: mutex X等待

博客首页 » Oracle cursor: mutex X等待

发布于 21 Dec 2014 14:22
标签 blog
cursor: mutex X等待

cursor: mutex X等待事件

2011/08/08 BY MACLEAN LIU
http://www.oracledatabase12g.com/archives/cursor-mutex-x.html

cursor: mutex * events等待事件用于Cursor Parent 和 Cursor stats类型的操作：
‘cursor: mutex X’ ，某个进程申请以EXCL mode持有mutex时进入该等待，该Mutex要么正被其他进程以SHRD模式参考，这导致X mode的申请必须要等待直到Ref count=0，或者该mutex正被另一个进程以X mode持有。
相关操作要求以EXCL X mode持有Mutex的：
在一个父游标下创建一个新的子游标
捕获SQL中的绑定变量
更新或构件SQL统计信息V$SQLSTATS

http://www.askmaclean.com/archives/understanding-oracle-mutex.html

系统出现cursor: mutex X等待导致实例HANG死

2012-09-24 23:43:01
http://blog.itpub.net/4227/viewspace-749442/

客户数据库出现明显的cursor: mutex X等待，尝试通过ALTER SYSTEM KILL SESSION的方式释放等待的会话，结果导致整个实例出现HANG死的情况。

数据库版本为10.2.0.4，检查数据库问题发生时刻的AWR报告，发现cursor: mutex X等待在AWR报告中并不明显，不但没有出现在TOP 5里，而在整个等待事件的排名中也排在非常靠后的位置，且该事件的总等待时间为0，这与客户反馈的大约70多个会话持续等待cursor: mutex X事件的描述极为不符。
导致这个现象的原因可能有两种，一种是虽然从V$SESSION中看到当前的等待是cursor :mutex X事件，但是该事件已经结束，因此会话的等待时间并未记录并汇总到这个事件的等待时间中；另外一种情况是Oracle对于这个事件的统计值存在错误。
继续分析AWR报告，可以发现VERSION COUNT报告中，最高的两个SQL已经远远超过了正常的范围，达到了13W以上，毫无疑问这肯定不正常的情况所致：

Version Count	Executions	SQL Id	SQL Module	SQL Text
150,649		7q1tdx7ysmg6f		SQL Text Not Available
134,787		fa3vfvs859n5n		SQL Text Not Available
10,732		gr0wk1tf0npm4		select count(:"SYS_B_0") as ro…
6,547		6r91qu3knm3vz		SQL Text Not Available
5,888		36ft4hmuru5d2		SQL Text Not Available

排在第3和第4位的VERSION COUNT达到5000以上就已经不太正常了，何况最高的两条语句的值已经超过了13W，不用说这两条语句会占用大量的共享池空间：

Sharable Mem (b)	Executions	% Total	SQL Id	SQL Module	SQL Text
3,082,390,224		28.84	7q1tdx7ysmg6f		SQL Text Not Available
2,902,320,992		27.16	fa3vfvs859n5n		SQL Text Not Available
174,014,440		1.63	gr0wk1tf0npm4		select count(:"SYS_B_0") as ro…
126,804,120		1.19	36ft4hmuru5d2		SQL Text Not Available
106,961,832		1	6r91qu3knm3vz		SQL Text Not Available

果然不出所料，每个问题SQL占用的共享池空间都达到了3G左右。
显然导致当前cursor: mutex X的问题肯定和这两个SQL直接相关，最后通过ASH视图验证一下这个判断：
SQL> select to_char(sample_time, 'yyyy-mm-dd hh24:mi:ss.ff'), session_id, event_name, sql_id
2 from wrh$_active_session_history ash, wrh$_event_name en
3 where ash.dbid = 1944025784
4 and en.dbid = 1944025784
5 and ash.event_id = en.event_id(+)
6 and event_name like '%mutex%'
7 and sample_time <= to_timestamp('20120908', 'yyyymmdd')
8 order by 1;
TO_CHAR(SAMPLE_TIME,'YYYY-MM- SESSION_ID EVENT_NAME SQL_ID
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2012-09-07 10:00:15.044 882 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:00:25.053 882 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:00:35.061 882 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:00:45.069 882 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:00:55.078 882 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:01:05.086 882 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:01:15.093 882 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:01:25.101 882 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:01:35.109 882 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:01:45.117 882 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:01:55.126 882 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:02:05.134 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:02:15.141 983 cursor: mutex X fa3vfvs859n5n
2012-09-07 10:02:25.149 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:02:35.157 983 cursor: mutex X
2012-09-07 10:02:45.164 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:02:55.173 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:03:05.181 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:03:15.188 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:03:25.196 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:03:35.204 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:03:45.211 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:03:55.220 983 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 10:04:05.228 983 cursor: mutex X 7q1tdx7ysmg6f
.
.
.
2012-09-07 11:59:02.738 887 cursor: mutex X fa3vfvs859n5n
2012-09-07 11:59:12.748 887 cursor: mutex X 7q1tdx7ysmg6f
2012-09-07 11:59:22.757 887 cursor: mutex X fa3vfvs859n5n
2012-09-07 11:59:32.767 887 cursor: mutex X 7q1tdx7ysmg6f
428 rows selected.
SQL> select sql_id, count(*)
2 from wrh$_active_session_history ash, wrh$_event_name en
3 where ash.dbid = 1944025784
4 and en.dbid = 1944025784
5 and ash.event_id = en.event_id(+)
6 and event_name like '%mutex%'
7 and sample_time <= to_timestamp('20120908', 'yyyymmdd')
8 group by sql_id;
SQL_ID COUNT(*)
- - - - - - - - - - - - - - - - - - - - - - -
1
7q1tdx7ysmg6f 232
fa3vfvs859n5n 190
43tcru5bkgdvc 5
很明显，导致系统出现cursor: mutex X等待的就是AWR报告中看到的两个语句，正是这两个语句不断的产生新的VERSION，导致了cursor: mutex X事件，并占用越来越多的共享池空间，最终导致整个数据库出现了HANG死的情况。
根据分析导致这个问题的最直接的原因是初始化参数CURSOR_SHARING设置为SIMILAR。而CURSOR_SHARING设置为SIMILAR从而导致VERSION COUNT过大的问题已经有过很多案例了，对于当前的数据库而言，要想彻底避免这个问题，除了升级到最高的10.2.0.5.6以外，就是通过改变CURSOR_SHARING的设置。显然EXACT是最佳方案，但是这必须同时修改程序使用绑定变量，才能避免由于参数修改带来的硬解析的问题。此外修改为FORCE也是一个选择，不过这需要在测试环境中进行验证，避免此修改带来的性能问题和其他BUG的风险。

案例升级11g后出现 library cache: mutex X

环境：Oracle 11gR2 11.2.0.2 on Linux
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
cursor: mutex S 305,105,364 627,169 2 62.69 Concurrency
DB CPU 153,958 15.39
library cache lock 274,879 123,937 451 12.39 Concurrency
enq: TX - row lock contention 3,968 96,278 24264 9.62 Application
library cache: mutex X 445,234 19,625 44 1.96 Concurrency

在业务高峰期的时候DB服务器CPU利用率将近100%，大量会话在cursor: mutex S上等待。DB中具有高version count值的SQL，
Google下，有下面2篇文章可以参考，应该是碰到了11gR2的Bug。
http://qa.taobao.com/?p=9131 【GBA】高并发下出现“cursor mutex s”致命错误-耿电
http://www.usn-it.de/index.php/2010/08/04/oracle112-mutex-s-too-many-child-cursors/ Oracle 11.2: Cursor Mutex S wait event and too many (2^30) child cursors

大家在Oracle11g的使用过程中有碰到类似的问题吗？是如何解决的？

棉花糖ONE：把adapative cursor sharing禁用掉后再看看

回复 #4 棉花糖ONE 的帖子

的确，可以测试下。
通过下面的方式关闭adapative cursor sharing.
alter system set "_optimizer_extended_cursor_sharing_rel"=none;
alter system set "_optimizer_extended_cursor_sharing"=none;
alter system set "_optimizer_adaptive_cursor_sharing"=false;

顺便提下：
我们的应用跑在9i、10g下都没有问题，就升到11g之后才出的问题。
version count多的SQL是个insert的SQL，SQL里面有个直接对sequence.nextval的调用。
现在通过改写SQL（首先取sequence.nextval到变量，再在INSERT语句中使用）达到了减少version count和child cursor的目的。
能否解决问题还在测试中。

去年在生产库碰到此问题，给折磨了好一阵后，把cursor_sharing改为force后解决。
/*
created by husthxd 2010-12-16
*/

一、问题诊断
通过检查出问题时的statspack报告诊断数据库。
1、数据库实例状态
Instance Efficiency Indicators

Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.95 Optimal W/A Exec %: 100.00
Library Hit %: 62.49 Soft Parse %: 96.42
Execute to Parse %: 15.47 Latch Hit %: 99.94
Parse CPU to Parse Elapsd %: 49.25 % Non-Parse CPU: 75.08

从以上信息可以看出：
Library Hit偏低（通常应>95%）;
Execute to Parse明显偏低，每Execute 10次，需Parse 8.5次，Parse是耗CPU的操作，这也是导致CPU居高不下的原因。
从后续的系统统计视图也可以看出：
Statistic Time (s) % DB time
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
parse time elapsed 2,602.9 99.2

2、Top 5 Wait Events
从top 5等待事件可以看出与Library和Cursor相关的等待很高，合计占所有等待时间的98.8%
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
library cache lock 21,461 923 43 55.6
cursor: mutex S 5,220,286 717 0 43.2
CPU time 12 .7
db file sequential read 418 3 7 .2
log file sync 773 1 2 .1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

3、检查Wait Events
Wait Events (fg and bg) DB/Inst: SDMZLW/sdmzlw Snaps: 5-6
Avg %Total
%Tim Total Wait wait Waits Call
Event Waits out Time (s) (ms) /txn Time
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
library cache lock 21,461 0 923 43 32.7 55.6
cursor: mutex S 5,220,286 100 717 0 7,945.6 43.2
注意：等待时间cursor: mutex S的等待超时（Time Out）是100%，与出问题时数据库一直在执行简单的查询语句而没有返回的现象吻合。

二、解决方案
使用cursor: mutex S 关键字google，从找到的相关资料可以看出
mutex是10g以后出现的内存结构，在10g默认没有使用，在11g才开始启用，之所以出现问题有可能是因为数据库的bug引起的。
解决方法：
1/把数据库参数cursor_sharing设置为force(可在线修改)。
2/停止自动收集统计信息的job
3/停止AWR收集JOB
4/出现上述情况主要是某个SQL的并行执行次数太多了而导致在child cursor上的mutex操作争用。因为SQL的并行太多，将该SQL复制成N个其它的SQL。
如：
select /*SQL 1*/ object_name from vw_mzlw_cbxx_bl where <条件语句>
select /*SQL 2*/ object_name from vw_mzlw_cbxx_bl where <条件语句>
select /*SQL …*/ object_name from vw_mzlw_cbxx_bl where <条件语句>
select /*SQL N*/ object_name from vw_mzlw_cbxx_bl where <条件语句>
这样就有了N个SQL Cursor,N个Mutex内存结构，就将争用分散开来，类似partition的作用了。

具体做法是在java拼装sql的时候，人为的加入区别sql的语句：
如：在select语句后加注释语句"/**/"
SELECT /*SQL <随机产生一个1-500的值>*/ * from vw_x1 where <条件语句>
SELECT /*SQL <随机产生一个1-500的值>*/ * from vw_x2 where <条件语句>

三、参考资料
http://www.dbform.com/html/2007/341.html
http://www.dbtan.com/2010/05/oracle-10g11g-latch.html
http://yumianfeilong.com/html/2008/11/01/254.html

本页面的文字允许在知识共享署名-相同方式共享 3.0协议和GNU自由文档许可证下修改和再使用，仅有一个特殊要求，请用链接方式注明文章引用出处及作者。请协助维护作者合法权益。

系列文章

文章列表

mysql InnoDB数据页结构Page Header

mysql InnoDB 文件数据页File Header结构

Oracle cursor: mutex X等待

Oracle 统计值Cluster Factor的含义和测试

Web JSON正确的MIME header

这篇文章对你有帮助吗，投个票吧？

留下你的评论

Add a New Comment

blog

page revision: 2, last edited: 21 Dec 2014 14:56

Edit Tags History Files Print Site tools + Options

MySQL, Oracle Abc - The Database Wiki

Focus on MySQL and Oracle Databases

回到首页

Database基礎

Oracle Course

MySQL Course

Cisco Course

Microsoft Course

Knowledge

Related Knowdge

共通

Tag

New

cursor: mutex X等待事件

系统出现cursor: mutex X等待导致实例HANG死

案例升级11g后出现 library cache: mutex X

留下你的评论

MySQL, Oracle Abc - The Database Wiki

Focus on MySQL and Oracle Databases

回到首页

Database基礎

Oracle Course

MySQL Course

Cisco Course

Microsoft Course

Knowledge

Related Knowdge

共通

Tag

New

cursor: mutex X等待事件

系统出现cursor: mutex X等待导致实例HANG死

案例 升级11g后出现 library cache: mutex X

留下你的评论

案例升级11g后出现 library cache: mutex X