[Oracle] 统计值Cluster Factor的含义和测试

博客首页 » Oracle 统计值Cluster Factor的含义和测试

发布于 21 Dec 2014 10:16
标签 blog
oracle index ( cluster_factor)

http://blog.itpub.net/7194105/viewspace-678357/

clustering_factor 是表征表中数据的存储顺序和某索引字段顺序的符合程度。

oracle 按照索引块中存储的rowid 来识别相临的索引中记录 在 表block中是否为相同块,如果索引中存在记录 rowid a,b,c,d……,若b 和 a 是同一个block,则比较 c 和 b,若这时不是同一个block,则clustering_factor + 1 ,然后比较 d 和 c,若还不是同一个 block,则clustering_factor + 1 ……

这样计算下来,clustering_factor 应该是介于 表 block数量 和 表记录数之间 的一个值,若 clustering_factor 接近 表block数量,则说明表中数据具有比较好的跟索引字段一样排序顺序的存储,通过索引进行 range scan 的代价比较小(需要读取的表块可能比较少),若clustering_factor 接近 row数量,则说明表中数据和索引字段排序顺序差异很大,杂乱无张。则通过索引进行 range scan 的代价比较大(需要读取的表块可能更多)。

当然,在oracle 920 开始,对于cluster_factor 比较接近表块数量的根据索引的大范围查询做了特别的处理,不再是读一个索引记录去搜索一个表记录了,而是成批处理(通过索引块一批rowid一次去表块中获得一批记录),这样就大大节约了读的成本( consistent gets)。

Oracle Index Clustering Factor(集群因子)

http://www.cnblogs.com/Richardzhu/articles/2874972.html

一、本文说明:

今天在做测试的时候发现字段上有索引,但是执行计划就是不走索引,经过在网上查找才发现原来是索引的集群因子过高导致的。本文属于转载+模拟。

二、官网说明

The index clustering factor measures row order in relation to an indexed value suches employee last name.The more order that exists in rowstorage for this value,the lower the clustering factor.

- - - - row存储的越有序,clustering factor的值越低。

The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index:

(1)、If the clustering factor is high,then Oracle Database performs a relatively high number of I/Os during a large index range scan.The index entriespoint to random table blocks,so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.

- - - - 当clustering factor很高时,说明index entry (rowid) 是随机指向一些block的,在一个大的index range scan时,这样为了读取这些rowid指向的block,就需要一次又一次重复的去读这些block。

(2)、If the clustering factor is low,then Oracle Database performs a relatively low number of I/Os during a large index range scan.The index keys in arange tend to point to the same data blcok,so the database does not have to read and reread the same blocks over and over.

- - - - 当clustering factor值低时,说明index keys (rowid) 是指向的记录是存储在相同的block里,这样去读row时,只需要在同一个block里读取就可以了,这样减少重复读取blocks的次数。

The clustering factor is relevant for index scans because it can show:

(1)、Whether the database will use an index for large range scans;

(2)、The degree of table organization in relation to the index key;

(3)、Whether you should consider using an index-organized table,partitioning,or table cluster if rows must be ordered by the index key.

三、Index Clustering Factor说明

简单的说,Index Clustering Factor是通过一个索引扫描一张表,需要访问的表的数据块的数量,即对I/O的影响,也代表索引键存储位置是否有序。

(1)、如果越有序,即相邻的键值存储在相同的block,那么这时候Clustering Factor的值就越低;

(2)、如果不是很有序,即键值是随机的存储在block上,这样在读取键值时,可能就需要一次又一次的去访问相同的block,从而增加了I/O。

Clustering Factor的计算方式如下:

(1)、扫描一个索引(large index range scan);

(2)、比较某行的rowid和前一行的rowid,如果这两个rowid不属于同一个数据块,那么cluster factor增加1;

(3)、整个索引扫描完毕后,就得到了该索引的clustering factor。

如果clustering factor接近于表存储的块数,说明这张表是按照索引字段的顺序存储的。

如果clustering factor接近于行的数量,那说明这张表不是按索引字段顺序存储的。

在计算索引访问成本的时候,这个值十分有用。Clustering Factor乘以选择性参数(selectivity)就是访问索引的开销。

如果这个统计数据不能真实反映出索引的真实情况,那么可能会造成优化器错误的选择执行计划。另外如果某张表上的大多数访问是按照某个索引做索引扫描,那么将该表的数据按照索引字段的顺序重新组织,可以提高该表的访问性能。

四、测试

4.1、产生问题:

复制代码
- - - - 查看一下数据库的版本- - - -
1 SQL> select * from v$version where rownum=1;
2
3 BANNER
4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
5 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
6
- - - - 创建一张测试表jack- - - -
7 SQL> create table jack as select * from dba_objects where 1=2;
8
9 Table created.
10
- - - - 将数据无序的插入jack表中- - - -
11 SQL> begin
12 2 for i in 1..10 loop
13 3 insert /*+ append */ into jack select * from dba_objects order by i;
14 4 commit;
15 5 end loop;
16 6 end;
17 7 /
18
19 PL/SQL procedure successfully completed.
20
21 SQL> select count(*) from jack;
22
23 COUNT(*)
24 - - - - - - - - - -
25 725460
26
- - - - 查看一下表的大小- - - - -
27 SQL> set wrap off
28 SQL> col owner for a10;
29 SQL> col segment_name for a15;
30 SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK';
31
32 SEGMENT_NAME BLOCKS EXTENTS size
33 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
34 JACK 11264 82 88M
35
- - - - 在object_id上创建索引- - - -
36 SQL> create index jack_ind on jack(object_id);
37
38 Index created.
39
- - - - 查看一下索引的大小- - - -
40 SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND';
41
42 SEGMENT_NAME SEGMENT_TYPE BLOCKS EXTENTS size
43 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
44 JACK_IND INDEX 1664 28 13M
- - - - 在没有收集相关的统计信息之前,查看一下index clustering factor- - - -
45 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
46
47 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
48 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
49 JACK_IND 725460 725460
50
- - - - 简单的收集一下统计信息- - - -
51 SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true);
52
53 PL/SQL procedure successfully completed.
54
- - - - 再次查看index clustering factor- - - -
55 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
56
57 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
58 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
59 JACK_IND 725460 725460 - - - - 显然统计信息收集前和后,clustering factor值不变,说在创建索引的时候,会收集表中的数据真正的行数。并且这里的clustering factor等num_rows,也说明表的clustering factor是无序的。
60
- - - - 查看一个确定值,然后查看执行计划- - - -
61 SQL> explain plan for select * from jack where object_id=1501;
62
63 Explained.
64
65 SQL> select * from table(dbms_xplan.display);
66
67 PLAN_TABLE_OUTPUT
68 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
69 Plan hash value: 2860868395
70
71 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
72 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Ti
73 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
74 | 0 | SELECT STATEMENT | | 10 | 970 | 13 (0)| 00
75 | 1 | TABLE ACCESS BY INDEX ROWID| JACK | 10 | 970 | 13 (0)| 00
76 |* 2 | INDEX RANGE SCAN | JACK_IND | 10 | | 3 (0)| 00
77 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
78
79 Predicate Information (identified by operation id):
80
81 PLAN_TABLE_OUTPUT
82 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
83
84
85 2 - access("OBJECT_ID"=1501)
86
87 14 rows selected. - - - - 在这里走了索引,cost为13.
88
89 SQL> alter system flush buffer_cache;
90
91 System altered.
92
93 SQL> set autotrace traceonly;
- - - - 查询一个范围的执行计划- - - -
94 SQL> select * from jack where object_id>1000 and object_id<2000;
95
96 9880 rows selected.
97
98
99 Execution Plan
100 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
101 Plan hash value: 949574992
102
103 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
104 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
105 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
106 | 0 | SELECT STATEMENT | | 9657 | 914K| 1824 (1)| 00:00:22 |
107 |* 1 | TABLE ACCESS FULL| JACK | 9657 | 914K| 1824 (1)| 00:00:22 |
108 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
109
110 Predicate Information (identified by operation id):
111 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
112
113 1 - filter("OBJECT_ID"<2000 AND "OBJECT_ID">1000)
114
115
116 Statistics
117 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
118 0 recursive calls
119 0 db block gets
120 10993 consistent gets
121 10340 physical reads
122 0 redo size
123 471945 bytes sent via SQL*Net to client
124 7657 bytes received via SQL*Net from client
125 660 SQL*Net roundtrips to/from client
126 0 sorts (memory)
127 0 sorts (disk)
128 9880 rows processed - - - - 注意,object_id上是有索引的,但是这里并没有使用索引,而是使用了全表扫描。
129
130 SQL> alter system flush buffer_cache;
131
132 System altered.
133
- - - - 强制走索引,查看执行计划- - - -
134 SQL> select /*+ index(jack jack_ind) */ * from jack where object_id>1000 and object_id<2000;
135
136 9880 rows selected.
137
138
139 Execution Plan
140 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
141 Plan hash value: 2860868395
142
143 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
144 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
145 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
146 | 0 | SELECT STATEMENT | | 9657 | 914K| 9683 (1)| 00:01:57 |
147 | 1 | TABLE ACCESS BY INDEX ROWID| JACK | 9657 | 914K| 9683 (1)| 00:01:57 |
148 |* 2 | INDEX RANGE SCAN | JACK_IND | 9657 | | 24 (0)| 00:00:01 |
149 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
150
151 Predicate Information (identified by operation id):
152 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
153
154 2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)
155
156
157 Statistics
158 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
159 0 recursive calls
160 0 db block gets
161 10561 consistent gets
162 164 physical reads
163 0 redo size
164 988947 bytes sent via SQL*Net to client
165 7657 bytes received via SQL*Net from client
166 660 SQL*Net roundtrips to/from client
167 0 sorts (memory)
168 0 sorts (disk)
169 9880 rows processed
- - - - 强制走索引之后,使用了index range scan,但是cost变成了9683,而全表扫描时是1824.
- - - - 还有比较一下两次查询中物理读的情况:全表扫描的物理读明显比索引的要高很多,但是Oracle却没有使用索引。
- - - - 因此Oracle认为走索引的Cost比走全表扫描大,而是大N倍,CBO是基于Cost来决定执行计划的。
- - - - 由此得出,对于索引的Cost,Oracle是根据clustering factor参数来计算的,而该实验中的clustering factor参数是很高的,数据存储无序。这就造成了Oracle认为走索引的cost比全表扫描的大。
复制代码
4.2、解决问题:

复制代码
- - - - 通过上面的分析,可以看出,要降低clustering factor才能解决问题,而要解决clustering factor,就需要重新对表的存储位置进行排序。- - - -
- - - - 重建jakc表- - - -
1 SQL> create table echo as select * from jack where 1=0;
2
3 Table created.
4
5 SQL> insert /*+ append */ into echo select * from jack order by object_id;
6
7 725460 rows created.
8
9 SQL> commit;
10
11 Commit complete.
12
13 SQL> truncate table jack;
14
15 Table truncated.
16
17 SQL> insert /*+ append */ into jack select * from echo;
18
19 725460 rows created.
20
21 SQL> commit;
22
23 Commit complete.
24
- - - - 查看表和索引的信息- - - -
25 SQL> select segment_name,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK';
26
27 SEGMENT_NAME BLOCKS EXTENTS size
28 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
29 JACK 11264 82 88M
30
31 SQL> select segment_name,segment_type,blocks,extents,bytes/1024/1024||'M' "size" from user_segments where segment_name='JACK_IND';
32
33 SEGMENT_NAME SEGMENT_TYPE BLOCKS EXTENTS size
34 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
35 JACK_IND INDEX 1536 27 12M
36
37 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
38
39 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
40 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
41 JACK_IND 725460 725460
42
- - - - 对索引进行rebuild- - - -
43 SQL> alter index jack_ind rebuild;
44
45 Index altered.
46
- - - - 查看cluster factor- - - -
47 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
48
49 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
50 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
51 JACK_IND 10327 725460 - - - - - - 注意这里的Factor,已经变成10327,我们收集一下表的统计信息,然后与表的block进行一次比较。
52
53 SQL> exec dbms_stats.gather_table_stats(user,'jack',cascade=>true);
54
55 PL/SQL procedure successfully completed.
56
57 SQL> select blocks from dba_tables where table_name='JACK';
58
59 BLOCKS
60 - - - - - - - - - -
61 10474 - - - - 表jack实际使用的block是10474,clustering factor是10327基本还是比较接近了,这也说明相邻的row是存储在相同的block里。
62
63 SQL> select index_name,clustering_factor,num_rows from user_indexes where index_name='JACK_IND';
64
65 INDEX_NAME CLUSTERING_FACTOR NUM_ROWS
66 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
67 JACK_IND 10327 725460
68
69 SQL> alter system flush buffer_cache;
70
71 System altered.
72
73 SQL> set autotrace traceonly;
- - - - 再次查看之前sql的执行计划- - - -
74 SQL> select * from jack where object_id>1000 and object_id<2000;
75
76 9880 rows selected.
77
78
79 Execution Plan
80 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
81 Plan hash value: 2860868395
82
83 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
84 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
85 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
86 | 0 | SELECT STATEMENT | | 9657 | 914K| 162 (0)| 00:00:02 |
87 | 1 | TABLE ACCESS BY INDEX ROWID| JACK | 9657 | 914K| 162 (0)| 00:00:02 |
88 |* 2 | INDEX RANGE SCAN | JACK_IND | 9657 | | 24 (0)| 00:00:01 |
89 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
90
91 Predicate Information (identified by operation id):
92 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
93
94 2 - access("OBJECT_ID">1000 AND "OBJECT_ID"<2000)
95
96
97 Statistics
98 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
99 1 recursive calls
100 0 db block gets
101 1457 consistent gets
102 151 physical reads
103 0 redo size
104 988947 bytes sent via SQL*Net to client
105 7657 bytes received via SQL*Net from client
106 660 SQL*Net roundtrips to/from client
107 0 sorts (memory)
108 0 sorts (disk)
109 9880 rows processed
- - - - 注意这里的cost已经降到了162,性能提升还是非常明显。
复制代码
五、小结

通过以上说明和测试,可以看到clustering factor也是索引健康的一个重要判断的标准。其值越低越好。它会影响CBO选择正确的执行计划。但是注意一点,clustering factor总是趋势与不断恶化的。


本页面的文字允许在知识共享 署名-相同方式共享 3.0协议和GNU自由文档许可证下修改和再使用,仅有一个特殊要求,请用链接方式注明文章引用出处及作者。请协助维护作者合法权益。


系列文章

文章列表

  • Oracle 统计值Cluster Factor的含义和测试

这篇文章对你有帮助吗,投个票吧?

rating: 0+x

留下你的评论

Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License