南大通用GBase 8a的global hash索引参数key_dc_size导致加载耗时波动

GBase 8a数据库集群支持global Hash索引,用于精确查询。为了方便释放空间,提供了key_dc_size参数,如果该参数和每次加载的数据量出现匹配,将会很容易出现抖动。

环境

三个节点的集群

[gbase@localhost ~]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.102 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.202 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator3 | 10.0.2.203 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.102                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.202                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.203                |       3        | OPEN  |    OPEN    |     0     |
-----

测试数据

100万行数据

表结构

CREATE TABLE "t2" (
  "id" int(11) DEFAULT NULL,
  "name" varchar(20) DEFAULT NULL,
  KEY "idx_name" ("name") KEY_BLOCK_SIZE=4096 KEY_DC_SIZE=10 USING HASH GLOBAL
)

key_dc_size参数

key_dc_size=10, 每个DC是66636行, 10个是65万行。 3个节点接近200万行。预计每次加载100万,大约2-3次会出现一次抖动。

运行效果

和预期相符,一次快,一次慢,很有规律。速度快的24-26秒,慢的30-39秒。

gbase> truncate table t2;
Query OK, 10000041 rows affected (Elapsed: 00:00:00.50)

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:22.68)
Task 13378 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:25.56)
Task 13379 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:22.09)
Task 13380 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:29.59)
Task 13381 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:22.81)
Task 13382 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:30.04)
Task 13383 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:26.06)
Task 13384 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:34.11)
Task 13385 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:24.57)
Task 13386 finished, Loaded 1000000 records, Skipped 0 records

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.202//home/gbase/data/ran.txt' into table t2 fields terminated by ',';
Query OK, 1000000 rows affected (Elapsed: 00:00:39.45)
Task 13387 finished, Loaded 1000000 records, Skipped 0 records

建议

key_dc_size参数,建议设置为每天或每半天数据量的计算值。简单公式

每天行数/节点数/65536, 然后取整即可。

如果是天表,那就不用了。