南大通用GBase 8a扩容缩容时重分布进度查询rebalancing_status

GBase 8a在做扩容缩容等操作时,会将需要重分布的任务,放到gclusterdb.rebalancing_status元数据表里,本文介绍该进度表的使用方法。

表结构

字段类型说明
index_namevarchar(129)任务索引,包含了库名和表名,用小数点分割
db_namevarchar(64)库名
table_namevarchar(64)表名
tmptablevarchar(129)数据迁移用到的临时表名字,一般以rebalanc开头
start_timedatetime开始时间,RUNNING状态有效
end_timedatetime完成时间,FINISHED状态有效
statusvarchar(32)任务状态。STARTING、RUNNING、COMPLETED、PAUSED、CANCELED
percentageint完成百分比,无百分号。RUNNING状态有效。默认0
priorityint优先级。越小优先级越高,默认5。
hostvarchar(60)任务所在IP
distribution_idbigint目标distribution_id
gbase> desc rebalancing_status;
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| index_name      | varchar(129) | YES  |     | NULL    |       |
| db_name         | varchar(64)  | YES  |     | NULL    |       |
| table_name      | varchar(64)  | YES  |     | NULL    |       |
| tmptable        | varchar(129) | YES  |     | NULL    |       |
| start_time      | datetime     | YES  |     | NULL    |       |
| end_time        | datetime     | YES  |     | NULL    |       |
| status          | varchar(32)  | YES  |     | NULL    |       |
| percentage      | int(11)      | YES  |     | NULL    |       |
| priority        | int(11)      | YES  |     | NULL    |       |
| host            | varchar(60)  | YES  |     | NULL    |       |
| distribution_id | bigint(8)    | YES  |     | NULL    |       |
+-----------------+--------------+------+-----+---------+-------+
11 rows in set (Elapsed: 00:00:00.00)

重分布任务状态

  • STARTING 准备状态,排队中
  • RUNNING 正在重分布中
  • COMPLETED 已经完成了重分布
  • PAUSED 任务暂停
  • CANCELED 任务取消

状态转换规则如下所示

GBase 8a扩容重分布状态转换示意图
  • 表处于 STARTING 状态时,Coordinator 后台线程开始执行表的 rebalance 操作,表状态转换成 RUNNING。
  • 表处于 PAUSED 状态时,对这个表执行 continue rebalance 操作,表状态转换成RUNNING。
  • 表处于 RUNNING 状态时,coordinator 后台线程执行表的 rebalance 操作失败,表状态转换成 STARTING。
  • 表处于 RUNNING 状态时,coordinator 后台线程完成了表的 rebalance 操作,表状态转换成 COMPLETED。

查询总体重分布进度

根据STATUS统计每个状态的数量。如下例子,等待中的15个,正在重分布的3个,完成了3个。

gbase> select status,count(*) from gclusterdb.rebalancing_status group by status;
+-----------+----------+
| status    | count(*) |
+-----------+----------+
| STARTING  |       15 |
| RUNNING   |        3 |
| COMPLETED |        3 |
+-----------+----------+
3 rows in set (Elapsed: 00:00:00.61)

当所有的表都处于COMPLETED状态,则重分布全部完成。

gbase> select status,count(*) from gclusterdb.rebalancing_status group by status;
+-----------+----------+
| status    | count(*) |
+-----------+----------+
| COMPLETED |       21 |
+-----------+----------+
1 row in set (Elapsed: 00:00:00.19)

查看正在重分布的任务进度

查看状态为RUNNING的进度。如下是4个正在运行的任务。

gbase> select * from rebalancing_status where status='RUNNING';
+-----------------------------------+------------+----------------------------+----------+----------------------------+------------+-----------+------------+----------+----------+-----------------+
| index_name                        | db_name    | table_name                 | tmptable | start_time                 | end_time   | status    | percentage | priority | host     | distribution_id |
+-----------------------------------+------------+----------------------------+----------+----------------------------+------------+-----------+------------+----------+----------+-----------------+
| testdb.t1                         | testdb     | t1                         |          | 2018-11-07 09:05:35.216000 |            | RUNNING   |        30  |        5 | 2001::61 |               2 |
| testdb.aggr_http_ci_hour_20180904 | testdb     | aggr_http_ci_hour_20180904 |          | 2018-11-07 09:05:35.276000 |            | RUNNING   |        10  |        5 | 2001::61 |               2 |
| testdb.dim_ci                     | testdb     | dim_ci                     |          | 2018-11-07 09:05:35.333000 |            | RUNNING   |        50  |        5 | 2001::61 |               2 |
| gclusterdb.audit_log_express      | gclusterdb | audit_log_express          |          | 2018-11-07 09:05:35.251000 |            | RUNNING   |        20  |        5 | 2001::61 |               2 |
+-----------------------------------+------------+----------------------------+----------+----------------------------+------------+-----------+------------+----------+----------+-----------------+
4 rows in set (Elapsed: 00:00:00.02)

需要注意,默认重分布时,每个节点,每次处理的数据是通过参数(gcluster_rebalancing_step)配置的,在V8里默认是2000万行,在V95里是1亿行。

再次提醒,是每个节点。比如100一行,10个节点平均,则每个节点处理10亿行,需要处理10轮。对应的percentage在处理时会变动。如果节点处理的数据少于参数值,则一次性处理,percentage从0直接到99或者100。

暂定重分布

pause rebalance table 库名.表名

以rebalancing_status里面的index_name为准。

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time | status   | percentage | priority | host | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
6 rows in set (Elapsed: 00:00:00.01)

gbase> pause rebalance table testdb.t1;
Query OK, 1 row affected (Elapsed: 00:00:00.08)

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time | status   | percentage | priority | host | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | PAUSED   |          0 |        5 | NULL |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
6 rows in set (Elapsed: 00:00:00.14)

继续重分布

continue rebalance table 库名.表名

以rebalancing_status里面的index_name为准。

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time | status   | percentage | priority | host | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | PAUSED   |          0 |        5 | NULL |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
6 rows in set (Elapsed: 00:00:00.14)

gbase>
gbase>
gbase>
gbase> continue rebalance table testdb.t1;
Query OK, 1 row affected (Elapsed: 00:00:00.08)

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------------------------+-----------+------------+----------+------------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time                   | status    | percentage | priority | host       | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------------------------+-----------+------------+----------+------------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:56:27.395000 | 2022-06-01 09:56:27.395000 | COMPLETED |          0 |        5 | 10.0.2.101 |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------------------------+-----------+------------+----------+------------+-----------------+
6 rows in set (Elapsed: 00:00:00.01)

取消重分布

cancel rebalance table 库名.表名

以rebalancing_status里面的index_name为准。

不保证一定能成功。

总结

通过统计已经FINISHED状态的表数量,计算出整体进度。通过查询RUNNING状态的数据,查看当前表的进度。

由于不同的表,数据量不同,耗时不同,所以通过进度来评估预计耗时,并没有那么精确,只能作为参考。用户可以根据业务数据量情况,划分不同的优先级,以此来评估每个不同优先级的完成进度,相对更准确些。