求知 文章 文库 Lib 视频 iPerson 课程 认证 咨询 工具 讲座 Modeler   Code  
会员   
 
  
 
 
 
对Group By 语句的一次优化过程
 

2010-09-09 作者:fuyuncat 来源:HelloDBA.com

 

生产环境中发现一条语句很慢,拿回来一看,其实是一个简单的Group By语句:

表CCMMT的数据量比较大,5M多条记录。

1、

SQL> select CDE, CID

  2  from CCMMT

  3  GROUP BY CDE, CID

  4  having max(ADT) < sysdate - 180;

707924 rows selected.

Elapsed: 00:06:17.49

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=414 Card=238583 Bytes=4771660)

   1    0   FILTER

   2    1     SORT (GROUP BY NOSORT) (Cost=414 Card=238583 Bytes=4771660)

   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'CCMMT' (Cost=414 Card=57969096 Bytes=1159381920)

   4    3         INDEX (FULL SCAN) OF 'CCMMT_TEMP_IDX' (NON-UNIQUE) (Cost=26 Card=57969096)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

    2769177  consistent gets

    1089991  physical reads

          0  redo size

   23926954  bytes sent via SQL*Net to client

     519785  bytes received via SQL*Net from client

      47196  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

     707924  rows processed

要6min多返回。尝试调整语句写法,用minus代替Group By:

2、

SQL> select DISTINCT CDE, CID

  2  from CCMMT

  3  where ADT < sysdate - 180

  4  minus

  5  select DISTINCT CDE, CID

  6  from CCMMT

  7  where ADT >= sysdate - 180;

707924 rows selected.

Elapsed: 00:00:21.53

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=190624 Card=2794940

          Bytes=111797600)

   1    0   MINUS

   2    1     SORT* (UNIQUE) (Cost=95312 Card=2794940 Bytes=55898800)                                               :Q13049001

   3    2       INDEX* (FAST FULL SCAN) OF 'CCMMT_UQ1' (UNIQUE) (Cost=77305 Card=2898455 Bytes=57969100)    :Q13049000

   4    1     SORT* (UNIQUE) (Cost=95312 Card=2794940 Bytes=55898800)                                               :Q13050001

   5    4       INDEX* (FAST FULL SCAN) OF 'CCMMT_UQ1' (UNIQUE) (Cost=77305 Card=2898455 Bytes=57969100)    :Q13050000

   2 PARALLEL_TO_SERIAL            SELECT DISTINCT C0 C0,C1 C1 FROM :Q13049000 ORDER BY C0,C1

   3 PARALLEL_TO_PARALLEL          SELECT /*+ INDEX_RRS(A1 "CCMMT_UQ1")*/ A1."CDE" C0,A1."CA

   4 PARALLEL_TO_SERIAL            SELECT DISTINCT C0 C0,C1 C1 FROM :Q13050000 ORDER BY C0,C1

   5 PARALLEL_TO_PARALLEL          SELECT /*+ INDEX_RRS(A1 "CCMMT_UQ1")*/ A1."CDE" C0,A1."CA

Statistics

----------------------------------------------------------

          0  recursive calls

         33  db block gets

     126566  consistent gets

     129243  physical reads

          0  redo size

   18461368  bytes sent via SQL*Net to client

     519785  bytes received via SQL*Net from client

      47196  SQL*Net roundtrips to/from client

          4  sorts (memory)

          2  sorts (disk)

     707924  rows processed

效果不错,Consistent gets 和 Physical Reads都下降了,同时只需要21s就返回了。但从查询计划看,用到了并行查询,因此会消耗更多的CPU。

在(ADT, CDE, CID )上创建索引,再次执行:

3、

SQL> select DISTINCT CDE, CID

  2  from CCMMT

  3  where ADT < sysdate - 180

  4  minus

  5  select DISTINCT CDE, CID

  6  from CCMMT

  7  where ADT >= sysdate - 180;

707924 rows selected.

Elapsed: 00:00:26.94

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=36018 Card=2794940 Bytes=111797600)

   1    0   MINUS

   2    1     SORT (UNIQUE) (Cost=18009 Card=2794940 Bytes=55898800)

   3    2       INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)

   4    1     SORT (UNIQUE) (Cost=18009 Card=2794940 Bytes=55898800)

   5    4       INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)

Statistics

----------------------------------------------------------

          0  recursive calls

        118  db block gets

      22565  consistent gets

      31604  physical reads

          0  redo size

   18461368  bytes sent via SQL*Net to client

     519785  bytes received via SQL*Net from client

      47196  SQL*Net roundtrips to/from client

          1  sorts (memory)

          1  sorts (disk)

     707924  rows processed

效果也比较理想,consistent gets和physical reads再次大大下降,返回时间和上面差不多,在一个数量级上,但是不再使用并行查询了。

用NOT Exists代替minus:

4、

SQL> select DISTINCT CDE, CID

  2  from CCMMT a

  3  where ADT < sysdate - 180

  4  AND NOT EXISTS

  5  (SELECT CDE, CID FROM

  6  (select DISTINCT CDE, CID

  7  from CCMMT

  8  where ADT >= sysdate - 180) b

  9  WHERE a.CDE = b.CDE

 10  AND a.CID = b.CID);

707924 rows selected.

Elapsed: 00:10:35.70

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=600 Card=144923 Bytes=2898460)

   1    0   SORT (UNIQUE) (Cost=600 Card=144923 Bytes=2898460)

   2    1     INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE)(Cost=2 Card=144923 Bytes=2898460)

   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'CCMMT' (Cost=2 Card=1 Bytes=20)

   4    3         INDEX (RANGE SCAN) OF 'CCMMT_TEMP_IDX' (NON-UNIQUE) (Cost=1 Card=9)

Statistics

----------------------------------------------------------

          5  recursive calls

        118  db block gets

   40535587  consistent gets

    3157604  physical reads

          0  redo size

   18461368  bytes sent via SQL*Net to client

     519785  bytes received via SQL*Net from client

      47196  SQL*Net roundtrips to/from client

          2  sorts (memory)

          1  sorts (disk)

     707924  rows processed

FT! consistent gets和physical reads爆涨,10min才返回结果!

用Not In换掉Not Exists:

5、

SQL> select DISTINCT CDE, CID

  2  from CCMMT a

  3  where ADT < sysdate - 180

  4  AND (CDE, CID) NOT IN

  5  (select DISTINCT CDE, CID

  6  from CCMMT

  7  where ADT >= sysdate - 180);

707924 rows selected.

Elapsed: 00:01:00.70

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=36425 Card=1 Bytes=40)

   1    0   SORT (UNIQUE NOSORT) (Cost=36425 Card=1 Bytes=40)

   2    1     MERGE JOIN (ANTI) (Cost=36423 Card=1 Bytes=40)

   3    2       SORT (JOIN) (Cost=18212 Card=2898455 Bytes=57969100)

   4    3         INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)

   5    2       SORT (UNIQUE) (Cost=18212 Card=2898455 Bytes=57969100)

   6    5         INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)

Statistics

----------------------------------------------------------

          0  recursive calls

        419  db block gets

      22565  consistent gets

      98692  physical reads

          0  redo size

   18461368  bytes sent via SQL*Net to client

     519785  bytes received via SQL*Net from client

      47196  SQL*Net roundtrips to/from client

          1  sorts (memory)

          1  sorts (disk)

     707924  rows processed

恩,consistent gets和建了索引时的minus方式一样,但是physical reads太大,返回时间太长---1min。同时用到了刚才建的索引。(呵呵,所以说,NOT EXISTS并不是什么情况下都比NOT IN更优啊)

在尝试用left join + is null代替not in:

6、

SQL> SELECT a.CDE, a.CID

  2  FROM

  3  (select DISTINCT CDE, CID

  4  from CCMMT

  5  where ADT < sysdate - 180) a,

  6  (select DISTINCT CDE, CID

  7  from CCMMT

  8  where ADT >= sysdate - 180) b

  9  WHERE a.CDE = b.CDE(+)

 10  AND a.CID = b.CID(+)

 11  AND b.CDE IS NULL;

707924 rows selected.

Elapsed: 00:00:25.46

Execution Plan

----------------------------------------------------------

   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=54675 Card=2794940 Bytes=117387480)

   1    0   FILTER

   2    1     MERGE JOIN (OUTER)

   3    2       VIEW (Cost=18009 Card=2794940 Bytes=58693740)

   4    3         SORT (UNIQUE) (Cost=18009 Card=2794940 Bytes=55898800)

   5    4           INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)

   6    2       SORT (JOIN) (Cost=36667 Card=2794940 Bytes=58693740)

   7    6         VIEW (Cost=18009 Card=2794940 Bytes=58693740)

   8    7           SORT (UNIQUE) (Cost=18009 Card=2794940 Bytes=55898800)

   9    8             INDEX (RANGE SCAN) OF 'CCMMT_IDX3' (NON-UNIQUE) (Cost=2 Card=2898455 Bytes=57969100)

Statistics

----------------------------------------------------------

         10  recursive calls

        118  db block gets

      22569  consistent gets

      31300  physical reads

          0  redo size

   18461368  bytes sent via SQL*Net to client

     519785  bytes received via SQL*Net from client

      47196  SQL*Net roundtrips to/from client

          6  sorts (memory)

          1  sorts (disk)

     707924  rows processed 

效果不错,和有索引时使用minus在同一数量级上。

总结,以上几种方式中,效果最好的应该是第3种和第6种,buffer gets、磁盘IO和CPU消耗都比较少,返回时间大大减少,但是需要新建一个索引,消耗更多磁盘空间,并存在影响其它语句的正常查询计划的风险。而第2种方式应该是次好的。在返回时间上,和上面两种差不多,不需要新的索引,但是会消耗更多的内存、磁盘和CPU资源。

出于综合考虑,采用了第2种方式对生产库进行了优化。

(以上例子中的对象名进行了替换,其他都是原版)



MySQL索引背后的数据结构
MySQL性能调优与架构设计
SQL Server数据库备份与恢复
让数据库飞起来 10大DB2优化
oracle的临时表空间写满磁盘
数据库的跨平台设计
更多...   


并发、大容量、高性能数据库
高级数据库架构设计师
Hadoop原理与实践
Oracle 数据仓库
数据仓库和数据挖掘
Oracle数据库开发与管理


GE 区块链技术与实现培训
航天科工某子公司 Nodejs高级应用开发
中盛益华 卓越管理者必须具备的五项能力
某信息技术公司 Python培训
某博彩IT系统厂商 易用性测试与评估
中国邮储银行 测试成熟度模型集成(TMMI)
中物院 产品经理与产品管理
更多...