在GBase 8a中,会将子查询保存到临时表里。 如果多个子查询是完全相同的,则会导致资源浪费。本文提供了CTE和参数共2个可行方案来优化这个场景。
场景
某个子查询的结果,会被多次使用,类似如下的
select XXX from (SubQueryA) ta
union all
select YYY from (SubQueryA) tb
union all
select ZZZ from (SubQueryA) tc
优化方案
采用CTE(Common Table Express)
SQL将优化成如下写法,该SQL将只执行一次,节省资源,同时SQL语句也更易读。
with t as SubQueryA
select XXX from t
union all
select YYY from t
union all
select ZZZ from t
参考 GBase 8a 支持CTE 功能with as 介绍
调整数据库参数
CTE方法需要重写SQL语句,如果已上线的系统,短时间内无法改写,可以通过数据库的参数来【复用】临时表。
_t_gcluster_reuse_tmp_table_optimize
该参数默认为0,关闭,设置为2From子查询可以复用。
支持session 级。
如下是开启前和开启后的执行计划对比
默认执行计划
可以看到步骤0和步骤1是各自独立运行的,生成了2个临时表,步骤2将0和1的临时表做了最终处理。
gbase> explain select sum(id) from (select id,count(*) cnt from t1 group by id) t1 union all select sum(cnt) from (select id,count(*) cnt from t1 group by id) t1 ;
+----+----------+------------+--------+-------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+------------+--------+-------------+
| 02 | [RESULT] | Step | <00> | |
| | | AGG | | |
| | | UNION ALL | | |
| | | Step | <01> | |
| | | AGG | | |
| 01 | [GATHER] | SubQuery3 | t1 | |
| | | Table | t1[id] | |
| | | GROUP | | GROUP BY id |
| | | AGG | | |
| 00 | [GATHER] | SubQuery1 | t1 | |
| | | Table | t1[id] | |
| | | GROUP | | GROUP BY id |
| | | AGG | | |
+----+----------+------------+--------+-------------+
13 rows in set (Elapsed: 00:00:00.00)
开启参数后的执行计划
可以看到子查询会先执行,生成临时表0, 然后两部分各自在0的临时表的基础上,执行查询,并生成01,02的临时表,最后03将01和02汇总得到结果。
gbase> explain select sum(id) from (select id,count(*) cnt from t1 group by id) t1 union all select sum(cnt) from (select id,count(*) cnt from t1 group by id) t1 ;
+----+-------------+------------+--------+-------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+-------------+------------+--------+-------------+
| 03 | [RESULT] | Step | <01> | |
| | | AGG | | |
| | | UNION ALL | | |
| | | Step | <02> | |
| | | AGG | | |
| 02 | [GATHER] | Step | <00> | |
| | | AGG | | |
| 01 | [GATHER] | Step | <00> | |
| | | AGG | | |
| 00 | [NO REDIST] | Table | t1[id] | |
| | | GROUP | | GROUP BY id |
+----+-------------+------------+--------+-------------+
11 rows in set (Elapsed: 00:00:00.01)
总结
对于查询的复用,特别是耗时较久的,建议用CTE方案改写。 短期内可以考虑数据库参数。