南大通用GBase 8a 对主副本不一致时的处理方案测试

由于某些原因,主副本出现了不一致,包括表结构,数量等。GBase 8a的新版本,对这种情况做了处理,本文介绍相关参数和测试用例。

参考

GBase 8a执行DML报主副本不一致file_size_limit不同

GBase 8a发生主副本都损坏状态为1的几种原因

总结

经过测试,GBase 8a的新功能,可以在表数据行数不一致,表结构不一致,表SCN号不一致等多种情况下,自动判断并恢复。

不一致原因

从个人项目看,出现过如下几种情况,导致主副本不一致

参数不同

数据库某些参数不同,导致主副本的不一致。由于参数都是本地配置,其一致性需要用户自行保障。 实例请看参考的文章。

服务器断电、死机

对于物理机,一般是RAID卡有问题,包括驱动BUG,无电池等。 对于虚拟机,云主机,实例异常退出,断电等。

GBase 8a在写入数据时,是直接方式(direct io), 不是缓冲写(先写到操作系统的脏数据 Dirty data,再依赖操作系统的刷新),所以只有写入返回成功时,数据库内部才认定写入成功。但如果底层环境出现了上述的问题,则成功写入的数据,并不能保证一定刷到磁盘上了。

手工误操作

最常见的情况是节点【临时】故障期间,不小心清掉了该节点的Event。

参数

gcluster_suffix_consistency_resolve

默认值为0,不解决分片一致性问题。 设置为1,尝试解决。

该参数支持session级和global级。

注意:

  • 该参数为近期(2022-09-02)新增加的,您如果要使用,需要看版本是否支持。目前应覆盖到862Build33、862Build43和9.5系列。
  • 该功能最少要3个主机节点才可以。2节点集群不保证所有功能可以工作。

测试环境

862Build33的新版本,3节点集群。

节点

[gbase@862b43 gcinstall_862.33.35]$ gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

==========================================================
|         GBASE COORDINATOR CLUSTER INFORMATION          |
==========================================================
|   NodeName   | IpAddress  |gcware |gcluster |DataState |
----------------------------------------------------------
| coordinator1 | 10.0.2.106 | OPEN  |  OPEN   |    0     |
----------------------------------------------------------
| coordinator2 | 10.0.2.107 | OPEN  |  OPEN   |    0     |
----------------------------------------------------------
| coordinator3 | 10.0.2.108 | OPEN  |  OPEN   |    0     |
----------------------------------------------------------
======================================================
|           GBASE DATA CLUSTER INFORMATION           |
======================================================
|NodeName | IpAddress  |gnode |syncserver |DataState |
------------------------------------------------------
|  node1  | 10.0.2.106 | OPEN |   OPEN    |    0     |
------------------------------------------------------
|  node2  | 10.0.2.107 | OPEN |   OPEN    |    0     |
------------------------------------------------------
|  node3  | 10.0.2.108 | OPEN |   OPEN    |    0     |
------------------------------------------------------

软件

[gbase@862b43 gcluster]$ gccli testdb

GBase client 8.6.2.33-R48.274e5f5e. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select version();
+-----------------------+
| version()             |
+-----------------------+
| 8.6.2.33-R48.274e5f5e |
+-----------------------+
1 row in set (Elapsed: 00:00:00.01)

数据

2字段的分布表,方便观察节点的数据。

初始数据

手工插入9行数据,确认每个分片都有数据。


gbase> create database testdb;
Query OK, 1 row affected (Elapsed: 00:00:00.03)

gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> create table t1(id int, name varchar(100))distributed by('id');
Query OK, 0 rows affected (Elapsed: 00:00:00.26)

gbase> insert into t1 values(1,1111),(2,2222),(3,33333),(4,44444),(5,5555),(6,66666),(7,777777),(8,8888),(9,9999);
Query OK, 9 rows affected (Elapsed: 00:00:00.26)
Records: 9  Duplicates: 0  Warnings: 0

gbase> select * from t1;
+------+--------+
| id   | name   |
+------+--------+
|    4 | 44444  |
|    6 | 66666  |
|    7 | 777777 |
|    1 | 1111   |
|    3 | 33333  |
|    5 | 5555   |
|    9 | 9999   |
|    2 | 2222   |
|    8 | 8888   |
+------+--------+
9 rows in set (Elapsed: 00:00:00.01)

检查分片,每个都有数据

[gbase@862b43 gcinstall_862.33.35]$ gncli testb
ERROR 1049 (42000): Unknown database 'testb'
[gbase@862b43 gcinstall_862.33.35]$ gncli testdb

GBase client 8.6.2.33-R48.274e5f5e. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select * from t1_n1;
+------+--------+
| id   | name   |
+------+--------+
|    4 | 44444  |
|    6 | 66666  |
|    7 | 777777 |
+------+--------+
3 rows in set (Elapsed: 00:00:00.00)

gbase> select * from t1_n2;
ERROR 1146 (42S02): Table 'testdb.t1_n2' doesn't exist
gbase> select * from t1_n3;
+------+-------+
| id   | name  |
+------+-------+
|    1 | 1111  |
|    3 | 33333 |
|    5 | 5555  |
|    9 | 9999  |
+------+-------+
4 rows in set (Elapsed: 00:00:00.00)

gbase> ^CAborted
[gbase@862b43 gcinstall_862.33.35]$ gncli testdb -h 10.0.2.107

GBase client 8.6.2.33-R48.274e5f5e. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select * from t1_n2;
+------+------+
| id   | name |
+------+------+
|    2 | 2222 |
|    8 | 8888 |
+------+------+
2 rows in set (Elapsed: 00:00:00.01)

加载数据

6行数据

[gbase@862b43 ~]$ cat t1.txt
1,1111
3,3333
2,2222
4,4444
5,5555
6,6666

测试分片行数不一致场景

构造故障

手工向分片插入数据。注意SCN号要保持一致。

[gbase@862b43 gcluster]$ gncli testdb

GBase client 8.6.2.33-R48.274e5f5e. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select scn from information_schema.tables where table_name='t1_n1';
+------+
| scn  |
+------+
|   22 |
+------+
1 row in set (Elapsed: 00:00:00.00)

gbase> SET SELF SCN =22;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> select * from t1_n1;
+------+--------+
| id   | name   |
+------+--------+
|    4 | 44444  |
|    6 | 66666  |
|    7 | 777777 |
|    4 | 4444   |
|    6 | 6666   |
|    4 | 4444   |
|    6 | 6666   |
|    4 | 4444   |
|    6 | 6666   |
+------+--------+
9 rows in set (Elapsed: 00:00:00.01)

gbase> insert into t1_n1 values(4,4444444444444);
Query OK, 1 row affected (Elapsed: 00:00:00.05)
Records: 1  Duplicates: 0  Warnings: 0 Total: 10

gbase> select scn from information_schema.tables where table_name='t1_n1';
+------+
| scn  |
+------+
|   22 |
+------+
1 row in set (Elapsed: 00:00:00.00)

加载测试

先连续加载2次,均报错。设置参数后,再加载一次,报错,后台会自动同步,再加载成功。

注意报错内容是:Column 0, no_obj: 10 -- 9 行数不同


gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 24 failed, [10.0.2.107:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9
SQL: LOAD /*+ TID('168') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TE
gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 25 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9
SQL: LOAD /*+ TID('170') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TE
gbase> set gcluster_suffix_consistency_resolve=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 26 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9
SQL: LOAD /*+ TID('173') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TE
gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
Query OK, 6 rows affected (Elapsed: 00:00:00.50)
Task 27 finished, Loaded 6 records, Skipped 0 records


gbase>

查看日志

确认有不一致处理和同步信息

连接节点 106 gcluster的express.log

注意其中的SUFFIX_INCONSISTENCY_JUDGER部分,设置了storageevent 4

2022-09-02 10:30:47.823 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.106:5050->98) reason: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9 SQL(LOAD /*+ TID('170') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.106' CURRENT_TIMESTAMP 1662085847 SCN_NUMBER 25 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 25, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
2022-09-02 10:30:47.847 [LOAD][ERROR][S:74][Q:176]:Task 25 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9
SQL: LOAD /*+ TID('170') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.106' CURRENT_TIMESTAMP 1662085847 SCN_NUMBER 25 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 25, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 ...  1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
2022-09-02 10:30:47.973 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.106:5050->92) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(25/cur:22) SQL(revert table "testdb"."t1_n1" scn_number 25)
2022-09-02 10:30:47.983 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.107:5050->96) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(25/cur:22) SQL(revert table "testdb"."t1_n1" scn_number 25)
2022-09-02 10:30:47.983 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.108:5050->86) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(25/cur:22) SQL(revert table "testdb"."t1_n2" scn_number 25)
2022-09-02 10:30:47.983 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.107:5050->97) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(25/cur:22) SQL(revert table "testdb"."t1_n2" scn_number 25)
2022-09-02 10:30:47.983 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.108:5050->87) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(25/cur:22) SQL(revert table "testdb"."t1_n3" scn_number 25)
2022-09-02 10:30:47.983 [SQLDISP][ERROR][S:74][Q:176]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.106:5050->93) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(25/cur:22) SQL(revert table "testdb"."t1_n3" scn_number 25)
2022-09-02 10:30:47.983 [COMMIT][WARN ][S:74][Q:176]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 10:30:47.983 [COMMIT][WARN ][S:74][Q:176]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 10:30:47.983 [COMMIT][WARN ][S:74][Q:176]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 10:30:47.983 [COMMIT][WARN ][S:74][Q:176]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 10:30:47.983 [COMMIT][WARN ][S:74][Q:176]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 10:30:47.983 [COMMIT][WARN ][S:74][Q:176]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 10:30:47.996 [COMMIT][INFO ][S:74][Q:176]:rollback table(testdb.t1) success for scn:25.
2022-09-02 10:30:56.316 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.106:5050->98) reason: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9 SQL(LOAD /*+ TID('173') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.106' CURRENT_TIMESTAMP 1662085856 SCN_NUMBER 26 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 26, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
2022-09-02 10:30:56.349 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:74][Q:178]:row count:  10.0.2.106:10 < 10.0.2.107:9
2022-09-02 10:30:56.349 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:74][Q:178]:host 10.0.2.106 table:testdb.t1_n1 judeged by compare row count.
2022-09-02 10:30:56.363 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:74][Q:178]:Set testdb.t1 suffix n1 StorageEvent successfully, event id is 4

2022-09-02 10:30:56.364 [LOAD][ERROR][S:74][Q:178]:Task 26 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) Column 0, no_obj: 10 -- 9
SQL: LOAD /*+ TID('173') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.106' CURRENT_TIMESTAMP 1662085856 SCN_NUMBER 26 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 26, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 ...  1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
2022-09-02 10:30:56.488 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.106:5050->92) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(26/cur:22) SQL(revert table "testdb"."t1_n1" scn_number 26)
2022-09-02 10:30:56.499 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.107:5050->96) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(26/cur:22) SQL(revert table "testdb"."t1_n1" scn_number 26)
2022-09-02 10:30:56.499 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.108:5050->86) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(26/cur:22) SQL(revert table "testdb"."t1_n2" scn_number 26)
2022-09-02 10:30:56.499 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.107:5050->97) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(26/cur:22) SQL(revert table "testdb"."t1_n2" scn_number 26)
2022-09-02 10:30:56.499 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.108:5050->87) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(26/cur:22) SQL(revert table "testdb"."t1_n3" scn_number 26)
2022-09-02 10:30:56.499 [SQLDISP][ERROR][S:74][Q:178]:Query failed, SessionCtx(0x5390000) HOST(10.0.2.106:5050->93) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(26/cur:22) SQL(revert table "testdb"."t1_n3" scn_number 26)
2022-09-02 10:30:56.499 [COMMIT][WARN ][S:74][Q:178]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 10:30:56.499 [COMMIT][WARN ][S:74][Q:178]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 10:30:56.499 [COMMIT][WARN ][S:74][Q:178]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 10:30:56.499 [COMMIT][WARN ][S:74][Q:178]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 10:30:56.499 [COMMIT][WARN ][S:74][Q:178]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 10:30:56.499 [COMMIT][WARN ][S:74][Q:178]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 10:30:56.512 [COMMIT][INFO ][S:74][Q:178]:rollback table(testdb.t1) success for scn:26.
2022-09-02 10:30:58.208 [LOAD][WARN ][S:74][Q:179]:slice(n1) on node(10.0.2.106) is invalid.
2022-09-02 10:30:58.480 [DEF  ] [S:74][Q:179][load]Node(10.0.2.106)testdb.t1[n1][scn:27]'s data state fails
2022-09-02 10:30:58.480 [DEF  ] [S:74][Q:179]Set data(testdb.t1:1) to LOCKED
2022-09-02 10:30:58.500 [DEF  ] [S:74][Q:179]Set data(testdb.t1:normal) to LOCKED, event id is 4
2022-09-02 10:30:58.670 [DEF  ] [S:74][Q:179][load]Node(10.0.2.106)testdb.t1[n1][scn:27]'s data state fails
2022-09-02 10:30:58.670 [DEF  ] [S:74][Q:179]Set data(testdb.t1:1) to LOCKED
2022-09-02 10:30:58.687 [DEF  ] [S:74][Q:179]Set data(testdb.t1:normal) to LOCKED, event id is 4

gc_recovery.log

2022-09-02 10:31:17.789 [INFO ] <RECOVER-INFO-1>: Start Recovering testdb.t1 tid 7
2022-09-02 10:31:17.789 [INFO ] <DATA-Recover-1>: Start dml recover testdb.t1,tid 7
2022-09-02 10:31:17.789 [INFO ] <DATA-Recover-1>: Start to dml recover of eventid(4)
2022-09-02 10:31:17.801 [INFO ] <GCWare>: GetSourceNodeForDMl for nodeId(1778515978), the node is unValid
2022-09-02 10:31:17.802 [INFO ] <DATA-Recover-1>: DoNormSqlNoResult, do(flush rollback `testdb`.`t1_n1`)
2022-09-02 10:31:17.805 [INFO ] <DATA-Recover-1>: recoverinfo gn dml: eventid:4 DB(testdb), TABLE(t1), SLICE(1), TID(7), src:0x1795293194(10.0.2.107) dst:0x1778515978(10.0.2.106)
2022-09-02 10:31:17.805 [INFO ] <CLockResource>: Lock: testdb start
2022-09-02 10:31:17.806 [INFO ] <CLockResource>: Lock: testdb end success
2022-09-02 10:31:17.806 [INFO ] <CLockResource>: Lock: testdb.t1.rsync start
2022-09-02 10:31:17.807 [INFO ] <CLockResource>: Lock: testdb.t1.rsync end success
2022-09-02 10:31:17.807 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA start
2022-09-02 10:31:17.809 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2022-09-02 10:31:17.809 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B start
2022-09-02 10:31:17.811 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2022-09-02 10:31:17.811 [INFO ] <DATA-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2022-09-02 10:31:17.811 [INFO ] <DATA-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 10:31:17.811 [INFO ] <GCWare>: testdb.t1 nodeid: 1778515978, segid: 1, tableid 7 have dmlstorageevent,total num 1,lasteventid: 4
2022-09-02 10:31:17.812 [INFO ] <DATA-Recover-1>: SYNC_SLICE, Find it is a meta broken table testdb.t1,and the event will be given up,event id 4
2022-09-02 10:31:17.825 [INFO ] <DATA-Recover-1>: SET_ONLINE, Succeeded in setting local node table 'testdb.t1_n1' online, eventinfo(eventid:4,segId:1)
2022-09-02 10:31:17.826 [INFO ] <CLockResource>: UnLock: testdb unlock success
2022-09-02 10:31:17.828 [INFO ] <CLockResource>: UnLock: testdb.t1.rsync unlock success
2022-09-02 10:31:17.829 [INFO ] <CLockResource>: UnLock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2022-09-02 10:31:17.831 [INFO ] <CLockResource>: UnLock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2022-09-02 10:31:17.832 [INFO ] <DATA-Recover-1>: End dml recover testdb.t1,tid 7
2022-09-02 10:31:17.832 [INFO ] <STORAGE-Recover-1>: Start DMLStorge recover testdb.t1,tid 7 eventnum 1
2022-09-02 10:31:17.832 [INFO ] <STORAGE-Recover-1>: Start to DMLStorge recover of eventid(4)
2022-09-02 10:31:17.833 [INFO ] <STORAGE-Recover-1>: recoverinfo gn(dmls): eventid=4, tablename=testdb.t1, slice:n1 src:0x1795293194(10.0.2.106) dst:0x1778515978(10.0.2.106)
2022-09-02 10:31:17.833 [INFO ] <STORAGE-Recover-1>: Revovering ..., for dmlstorage, forgcluster=false, dbname=testdb, tablename=t1, slice=n1, tid=7, eventid=4
2022-09-02 10:31:17.846 [INFO ] <CLockResource>: Lock: testdb start
2022-09-02 10:31:17.848 [INFO ] <CLockResource>: Lock: testdb end success
2022-09-02 10:31:17.848 [INFO ] <CLockResource>: Lock: testdb.t1.rsync start
2022-09-02 10:31:17.850 [INFO ] <CLockResource>: Lock: testdb.t1.rsync end success
2022-09-02 10:31:17.850 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA start
2022-09-02 10:31:17.852 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2022-09-02 10:31:17.852 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B start
2022-09-02 10:31:17.853 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2022-09-02 10:31:17.860 [INFO ] <STORAGE-Recover-1>: BackUpTable, meta broken table(t1_n1:7) has been deleted and backuped success
2022-09-02 10:31:17.896 [INFO ] <STORAGE-Recover-1>: CreateFailNodeField, do "CREATE TABLE /*+ distribution(1)*/ `testdb`.`t1_n1`(
  "id" int(11) DEFAULT NULL,
  "name" varchar(100) DEFAULT NULL
) ENGINE=EXPRESS TID(8) UID(1) DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' COLUMN_IDS(0, 1) TID(8)" success
2022-09-02 10:31:17.896 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CreateFailNodeField success
2022-09-02 10:31:17.896 [INFO ] <STORAGE-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2022-09-02 10:31:17.896 [INFO ] <STORAGE-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 10:31:17.896 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Syncing table slice{DB: 'testdb', Table: 't1', Slice: 'n1'} from sample gnode: '10.0.2.107'
2022-09-02 10:31:17.896 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync command line: 'start 'gc_sync_client' '10.0.2.107 testdb t1_n1 10.0.2.106 5258 3 1''
2022-09-02 10:31:18.482 [INFO ] <STORAGE-Recover-1>: check drop sql thread quit.
2022-09-02 10:31:18.482 [INFO ] <STORAGE-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '10.0.2.107 testdb t1_n1 10.0.2.106 5258 3 1'} returned with (0)
2022-09-02 10:31:18.482 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync client executed successfully
2022-09-02 10:31:18.482 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refreshing local gnode, DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 10:31:18.482 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refresh query: 'refresh table `testdb`.`t1_n1`;'
2022-09-02 10:31:18.483 [INFO ] <STORAGE-Recover-1>: Sync cl
2022-09-02 10:31:18.483 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync end, return 0
2022-09-02 10:31:18.483 [INFO ] <CLockResource>: Lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F start
2022-09-02 10:31:18.485 [INFO ] <CLockResource>: Lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F end success
2022-09-02 10:31:18.486 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't1', slice: 'n1'}
2022-09-02 10:31:18.489 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, SCN of table slice of local GNode and sample GNode(number of working nodes: 2)
2022-09-02 10:31:18.489 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,        Local GNode: 27
2022-09-02 10:31:18.489 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,    Sample GNode[1]: 27
2022-09-02 10:31:18.489 [INFO ] <CLockResource>: UnLock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F unlock success
2022-09-02 10:31:18.490 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CallSync success
2022-09-02 10:31:18.490 [INFO ] <CLockResource>: UnLock: testdb unlock success
2022-09-02 10:31:18.492 [INFO ] <CLockResource>: UnLock: testdb.t1.rsync unlock success
2022-09-02 10:31:18.493 [INFO ] <CLockResource>: UnLock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2022-09-02 10:31:18.494 [INFO ] <CLockResource>: UnLock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2022-09-02 10:31:18.502 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGeoup success!
2022-09-02 10:31:18.502 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log start, eventId=4.............
2022-09-02 10:31:18.510 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log end, eventId=4.............
2022-09-02 10:31:18.512 [INFO ] <RECOVER-INFO-1>: Finishing Recovering testdb.t1,tid 7

测试主备分片表结构不一致

故障模拟

手工增加一个列。 注意SCN号要保持一致。

gbase> select scn from information_schema.tables where table_name='t1_n1';
+------+
| scn  |
+------+
|   27 |
+------+
1 row in set (Elapsed: 00:00:00.00)

gbase> desc t1_n1;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(100) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
2 rows in set (Elapsed: 00:00:00.00)

gbase> SET SELF SCN =27;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> alter table t1_n1 add column age int;
Query OK, 11 rows affected (Elapsed: 00:00:00.08)
Records: 11  Duplicates: 0  Warnings: 0

gbase> desc t1_n1;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(100) | YES  |     | NULL    |       |
| age   | int(11)      | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
3 rows in set (Elapsed: 00:00:00.00)

gbase> select scn from information_schema.tables where table_name='t1_n1';
+------+
| scn  |
+------+
|   27 |
+------+
1 row in set (Elapsed: 00:00:00.00)

加载测试

方法一样。在设置参数后,第一次加载失败,数据库会设置标志来自动恢复,第二次加载成功。

注意报错信息:column count: 3 -- 2, 表示列数不一样。

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 28 failed, [10.0.2.107:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) column count: 3 -- 2
SQL: LOAD /*+ TID('181') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINA
gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 29 failed, [10.0.2.107:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) column count: 3 -- 2
SQL: LOAD /*+ TID('183') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINA
gbase> set gcluster_suffix_consistency_resolve=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 30 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) column count: 3 -- 2
SQL: LOAD /*+ TID('186') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINA
gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
Query OK, 6 rows affected (Elapsed: 00:00:00.52)
Task 31 finished, Loaded 6 records, Skipped 0 records

查看日志

连接节点 106 gcluster的express.log

注意其中的SUFFIX_INCONSISTENCY_JUDGER部分,设置了storageevent 5

2022-09-02 10:38:15.969 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.106:5050->132) reason: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) column count: 3 -- 2 SQL(LOAD /*+ TID('186') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.106' CURRENT_TIMESTAMP 1662086295 SCN_NUMBER 30 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 30, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0
2022-09-02 10:38:15.996 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:78][Q:192]:The same with: [10.0.2.107_n1][10.0.2.108_n2]
2022-09-02 10:38:15.996 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:78][Q:192]:host 10.0.2.106 table:testdb.t1_n1 judeged by compare table defination.
2022-09-02 10:38:16.010 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:78][Q:192]:Set testdb.t1 suffix n1 StorageEvent successfully, event id is 5

2022-09-02 10:38:16.011 [LOAD][ERROR][S:78][Q:192]:Task 30 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) column count: 3 -- 2
SQL: LOAD /*+ TID('186') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.106' CURRENT_TIMESTAMP 1662086295 SCN_NUMBER 30 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 30, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 ...  1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1
2022-09-02 10:38:16.135 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.106:5050->120) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(30/cur:27) SQL(revert table "testdb"."t1_n1" scn_number 30)
2022-09-02 10:38:16.144 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.107:5050->123) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(30/cur:27) SQL(revert table "testdb"."t1_n1" scn_number 30)
2022-09-02 10:38:16.144 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.108:5050->108) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(30/cur:27) SQL(revert table "testdb"."t1_n2" scn_number 30)
2022-09-02 10:38:16.144 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.107:5050->124) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(30/cur:27) SQL(revert table "testdb"."t1_n2" scn_number 30)
2022-09-02 10:38:16.144 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.108:5050->109) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(30/cur:27) SQL(revert table "testdb"."t1_n3" scn_number 30)
2022-09-02 10:38:16.145 [SQLDISP][ERROR][S:78][Q:192]:Query failed, SessionCtx(0x5764000) HOST(10.0.2.106:5050->121) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(30/cur:27) SQL(revert table "testdb"."t1_n3" scn_number 30)
2022-09-02 10:38:16.145 [COMMIT][WARN ][S:78][Q:192]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 10:38:16.145 [COMMIT][WARN ][S:78][Q:192]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 10:38:16.145 [COMMIT][WARN ][S:78][Q:192]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 10:38:16.145 [COMMIT][WARN ][S:78][Q:192]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 10:38:16.145 [COMMIT][WARN ][S:78][Q:192]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 10:38:16.145 [COMMIT][WARN ][S:78][Q:192]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 10:38:16.158 [COMMIT][INFO ][S:78][Q:192]:rollback table(testdb.t1) success for scn:30.
2022-09-02 10:38:17.402 [LOAD][WARN ][S:78][Q:193]:slice(n1) on node(10.0.2.106) is invalid.
2022-09-02 10:38:17.678 [DEF  ] [S:78][Q:193][load]Node(10.0.2.106)testdb.t1[n1][scn:31]'s data state fails
2022-09-02 10:38:17.678 [DEF  ] [S:78][Q:193]Set data(testdb.t1:1) to LOCKED
2022-09-02 10:38:17.700 [DEF  ] [S:78][Q:193]Set data(testdb.t1:normal) to LOCKED, event id is 5
2022-09-02 10:38:17.881 [DEF  ] [S:78][Q:193][load]Node(10.0.2.106)testdb.t1[n1][scn:31]'s data state fails
2022-09-02 10:38:17.881 [DEF  ] [S:78][Q:193]Set data(testdb.t1:1) to LOCKED
2022-09-02 10:38:17.897 [DEF  ] [S:78][Q:193]Set data(testdb.t1:normal) to LOCKED, event id is 5

gc_recovery

2022-09-02 10:38:18.632 [INFO ] <RECOVER-INFO-1>: Start Recovering testdb.t1 tid 7
2022-09-02 10:38:18.633 [INFO ] <DATA-Recover-1>: Start dml recover testdb.t1,tid 7
2022-09-02 10:38:18.633 [INFO ] <DATA-Recover-1>: Start to dml recover of eventid(5)
2022-09-02 10:38:18.642 [INFO ] <GCWare>: GetSourceNodeForDMl for nodeId(1778515978), the node is unValid
2022-09-02 10:38:18.642 [INFO ] <DATA-Recover-1>: DoNormSqlNoResult, do(flush rollback `testdb`.`t1_n1`)
2022-09-02 10:38:18.645 [INFO ] <DATA-Recover-1>: recoverinfo gn dml: eventid:5 DB(testdb), TABLE(t1), SLICE(1), TID(7), src:0x1795293194(10.0.2.107) dst:0x1778515978(10.0.2.106)
2022-09-02 10:38:18.646 [INFO ] <CLockResource>: Lock: testdb start
2022-09-02 10:38:18.647 [INFO ] <CLockResource>: Lock: testdb end success
2022-09-02 10:38:18.647 [INFO ] <CLockResource>: Lock: testdb.t1.rsync start
2022-09-02 10:38:18.648 [INFO ] <CLockResource>: Lock: testdb.t1.rsync end success
2022-09-02 10:38:18.648 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA start
2022-09-02 10:38:18.650 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2022-09-02 10:38:18.650 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B start
2022-09-02 10:38:18.651 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2022-09-02 10:38:18.652 [INFO ] <DATA-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2022-09-02 10:38:18.652 [INFO ] <DATA-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 10:38:18.652 [INFO ] <GCWare>: testdb.t1 nodeid: 1778515978, segid: 1, tableid 7 have dmlstorageevent,total num 1,lasteventid: 5
2022-09-02 10:38:18.652 [INFO ] <DATA-Recover-1>: SYNC_SLICE, Find it is a meta broken table testdb.t1,and the event will be given up,event id 5
2022-09-02 10:38:18.664 [INFO ] <DATA-Recover-1>: SET_ONLINE, Succeeded in setting local node table 'testdb.t1_n1' online, eventinfo(eventid:5,segId:1)
2022-09-02 10:38:18.665 [INFO ] <CLockResource>: UnLock: testdb unlock success
2022-09-02 10:38:18.666 [INFO ] <CLockResource>: UnLock: testdb.t1.rsync unlock success
2022-09-02 10:38:18.668 [INFO ] <CLockResource>: UnLock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2022-09-02 10:38:18.669 [INFO ] <CLockResource>: UnLock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2022-09-02 10:38:18.670 [INFO ] <DATA-Recover-1>: End dml recover testdb.t1,tid 7
2022-09-02 10:38:18.670 [INFO ] <STORAGE-Recover-1>: Start DMLStorge recover testdb.t1,tid 7 eventnum 1
2022-09-02 10:38:18.670 [INFO ] <STORAGE-Recover-1>: Start to DMLStorge recover of eventid(5)
2022-09-02 10:38:18.671 [INFO ] <STORAGE-Recover-1>: recoverinfo gn(dmls): eventid=5, tablename=testdb.t1, slice:n1 src:0x1795293194(10.0.2.106) dst:0x1778515978(10.0.2.106)
2022-09-02 10:38:18.671 [INFO ] <STORAGE-Recover-1>: Revovering ..., for dmlstorage, forgcluster=false, dbname=testdb, tablename=t1, slice=n1, tid=7, eventid=5
2022-09-02 10:38:18.683 [INFO ] <CLockResource>: Lock: testdb start
2022-09-02 10:38:18.684 [INFO ] <CLockResource>: Lock: testdb end success
2022-09-02 10:38:18.684 [INFO ] <CLockResource>: Lock: testdb.t1.rsync start
2022-09-02 10:38:18.686 [INFO ] <CLockResource>: Lock: testdb.t1.rsync end success
2022-09-02 10:38:18.686 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA start
2022-09-02 10:38:18.687 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2022-09-02 10:38:18.687 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B start
2022-09-02 10:38:18.688 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2022-09-02 10:38:18.695 [INFO ] <STORAGE-Recover-1>: BackUpTable, meta broken table(t1_n1:7) has been deleted and backuped success
2022-09-02 10:38:18.727 [INFO ] <STORAGE-Recover-1>: CreateFailNodeField, do "CREATE TABLE /*+ distribution(1)*/ `testdb`.`t1_n1`(
  "id" int(11) DEFAULT NULL,
  "name" varchar(100) DEFAULT NULL
) ENGINE=EXPRESS TID(8) UID(1) DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' COLUMN_IDS(0, 1) TID(8)" success
2022-09-02 10:38:18.728 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CreateFailNodeField success
2022-09-02 10:38:18.728 [INFO ] <STORAGE-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2022-09-02 10:38:18.728 [INFO ] <STORAGE-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 10:38:18.728 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Syncing table slice{DB: 'testdb', Table: 't1', Slice: 'n1'} from sample gnode: '10.0.2.107'
2022-09-02 10:38:18.728 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync command line: 'start 'gc_sync_client' '10.0.2.107 testdb t1_n1 10.0.2.106 5258 3 1''
2022-09-02 10:38:19.242 [INFO ] <STORAGE-Recover-1>: check drop sql thread quit.
2022-09-02 10:38:19.242 [INFO ] <STORAGE-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '10.0.2.107 testdb t1_n1 10.0.2.106 5258 3 1'} returned with (0)
2022-09-02 10:38:19.242 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync client executed successfully
2022-09-02 10:38:19.242 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refreshing local gnode, DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 10:38:19.242 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refresh query: 'refresh table `testdb`.`t1_n1`;'
2022-09-02 10:38:19.243 [INFO ] <STORAGE-Recover-1>: Sync cl
2022-09-02 10:38:19.243 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync end, return 0
2022-09-02 10:38:19.243 [INFO ] <CLockResource>: Lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F start
2022-09-02 10:38:19.245 [INFO ] <CLockResource>: Lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F end success
2022-09-02 10:38:19.246 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't1', slice: 'n1'}
2022-09-02 10:38:19.250 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, SCN of table slice of local GNode and sample GNode(number of working nodes: 2)
2022-09-02 10:38:19.250 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,        Local GNode: 31
2022-09-02 10:38:19.250 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,    Sample GNode[1]: 31
2022-09-02 10:38:19.250 [INFO ] <CLockResource>: UnLock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F unlock success
2022-09-02 10:38:19.251 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CallSync success
2022-09-02 10:38:19.252 [INFO ] <CLockResource>: UnLock: testdb unlock success
2022-09-02 10:38:19.254 [INFO ] <CLockResource>: UnLock: testdb.t1.rsync unlock success
2022-09-02 10:38:19.255 [INFO ] <CLockResource>: UnLock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2022-09-02 10:38:19.257 [INFO ] <CLockResource>: UnLock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2022-09-02 10:38:19.266 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGeoup success!
2022-09-02 10:38:19.266 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log start, eventId=5.............
2022-09-02 10:38:19.275 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log end, eventId=5.............
2022-09-02 10:38:19.277 [INFO ] <RECOVER-INFO-1>: Finishing Recovering testdb.t1,tid 7

测试SCN不一致

故障模拟

直接对分片操作,不设置SCN值,会导致表的SCN数值不一样。我们插入一行数据。

[gbase@862b43 gcinstall_862.33.35]$ gncli testdb

GBase client 8.6.2.33-R48.274e5f5e. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select * from t1_n1;
+------+--------+
| id   | name   |
+------+--------+
|    4 | 44444  |
|    6 | 66666  |
|    7 | 777777 |
|    4 | 4444   |
|    6 | 6666   |
+------+--------+
5 rows in set (Elapsed: 00:00:00.00)

gbase> insert into t1_n1 values(4,4444444444444);
Query OK, 1 row affected (Elapsed: 00:00:00.06)
Records: 1  Duplicates: 0  Warnings: 0 Total: 6

测试加载

故障信息是:table_max_scn: 0 -- 14, SCN号不同

[gbase@862b43 gcinstall_862.33.35]$ gccli testdb

GBase client 8.6.2.33-R48.274e5f5e. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 15 failed, [10.0.2.108:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) table_max_scn: 0 -- 14
SQL: LOAD /*+ TID('110') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n2` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMI
gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 16 failed, [10.0.2.106:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) table_max_scn: 0 -- 14
SQL: LOAD /*+ TID('112') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMI
gbase>

gbase> show variables like '%gcluster_suffix_consistency_resolve%';
+-------------------------------------+-------+
| Variable_name                       | Value |
+-------------------------------------+-------+
| gcluster_suffix_consistency_resolve | 0     |
+-------------------------------------+-------+
1 row in set (Elapsed: 00:00:00.01)

gbase> set gcluster_suffix_consistency_resolve=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> show variables like '%gcluster_suffix_consistency_resolve%';
+-------------------------------------+-------+
| Variable_name                       | Value |
+-------------------------------------+-------+
| gcluster_suffix_consistency_resolve | 1     |
+-------------------------------------+-------+
1 row in set (Elapsed: 00:00:00.01)

gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 17 failed, [10.0.2.107:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) table_max_scn: 0 -- 14
SQL: LOAD /*+ TID('118') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMI
gbase>
gbase> load data infile 'ftp://gbase:gbase1234@10.0.2.106/t1.txt' into table t1 fields terminated by ',';
Query OK, 6 rows affected (Elapsed: 00:00:00.52)
Task 18 finished, Loaded 6 records, Skipped 0 records

查看日志

连接节点 106 gcluster的express.log

2022-09-02 09:37:26.662 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.107:5050->44) reason: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) table_max_scn: 0 -- 14 SQL(LOAD /*+ TID('118') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.107' CURRENT_TIMESTAMP 1662082646 SCN_NUMBER 17 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 17, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0
2022-09-02 09:37:26.691 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:57][Q:131]:The same with: [10.0.2.107_n1]
2022-09-02 09:37:26.691 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:57][Q:131]:host 10.0.2.106 table:testdb.t1_n1 judeged by compare scn.
2022-09-02 09:37:26.704 [SUFFIX_INCONSISTENCY_JUDGER][WARN ][S:57][Q:131]:Set testdb.t1 suffix n1 StorageEvent successfully, event id is 2

2022-09-02 09:37:26.705 [LOAD][ERROR][S:57][Q:131]:Task 17 failed, [10.0.2.107:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: (gns_host: 10.0.2.106) source table and destination table are not same: src:( localhost:testdb.t1_n1 ) dst:( 10.0.2.107:testdb.t1_n1 ) table_max_scn: 0 -- 14
SQL: LOAD /*+ TID('118') */ DATA INFILE 'ftp://gbase:*********@10.0.2.106/t1.txt#offset=0&length=42&firstblock&ffsize=42' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED  FIELDS TERMINATED BY ',' HOST '10.0.2.107' CURRENT_TIMESTAMP 1662082646 SCN_NUMBER 17 GCLUSTER_PORT 5258 INTO SERVER (HOST '10.0.2.106, 10.0.2.107, 10.0.2.108', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2, t1_n3', COMMENT 'table_host 0 0 1 # 1 1 2 # 2 2 0, scn 17, group 0 ,distribution 1, hash_map 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 ...  1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0
2022-09-02 09:37:26.830 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.106:5050->29) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(17/cur:0) SQL(revert table "testdb"."t1_n1" scn_number 17)
2022-09-02 09:37:26.840 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.107:5050->32) reason: Can't revert table: Table testdb.t1_n1 can not do revert with scn(17/cur:14) SQL(revert table "testdb"."t1_n1" scn_number 17)
2022-09-02 09:37:26.840 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.108:5050->30) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(17/cur:14) SQL(revert table "testdb"."t1_n2" scn_number 17)
2022-09-02 09:37:26.840 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.107:5050->33) reason: Can't revert table: Table testdb.t1_n2 can not do revert with scn(17/cur:14) SQL(revert table "testdb"."t1_n2" scn_number 17)
2022-09-02 09:37:26.840 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.106:5050->30) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(17/cur:14) SQL(revert table "testdb"."t1_n3" scn_number 17)
2022-09-02 09:37:26.840 [SQLDISP][ERROR][S:57][Q:131]:Query failed, SessionCtx(0x5732800) HOST(10.0.2.108:5050->31) reason: Can't revert table: Table testdb.t1_n3 can not do revert with scn(17/cur:14) SQL(revert table "testdb"."t1_n3" scn_number 17)
2022-09-02 09:37:26.840 [COMMIT][WARN ][S:57][Q:131]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 09:37:26.840 [COMMIT][WARN ][S:57][Q:131]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 09:37:26.840 [COMMIT][WARN ][S:57][Q:131]:revert in host:10.0.2.107 for table:t1 success.
2022-09-02 09:37:26.840 [COMMIT][WARN ][S:57][Q:131]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 09:37:26.840 [COMMIT][WARN ][S:57][Q:131]:revert in host:10.0.2.108 for table:t1 success.
2022-09-02 09:37:26.840 [COMMIT][WARN ][S:57][Q:131]:revert in host:10.0.2.106 for table:t1 success.
2022-09-02 09:37:26.853 [COMMIT][INFO ][S:57][Q:131]:rollback table(testdb.t1) success for scn:17.
2022-09-02 09:37:30.167 [LOAD][WARN ][S:57][Q:132]:slice(n1) on node(10.0.2.106) is invalid.
2022-09-02 09:37:30.464 [DEF  ] [S:57][Q:132][load]Node(10.0.2.106)testdb.t1[n1][scn:18]'s data state fails
2022-09-02 09:37:30.464 [DEF  ] [S:57][Q:132]Set data(testdb.t1:1) to LOCKED
2022-09-02 09:37:30.486 [DEF  ] [S:57][Q:132]Set data(testdb.t1:normal) to LOCKED, event id is 2
2022-09-02 09:37:30.655 [DEF  ] [S:57][Q:132][load]Node(10.0.2.106)testdb.t1[n1][scn:18]'s data state fails
2022-09-02 09:37:30.655 [DEF  ] [S:57][Q:132]Set data(testdb.t1:1) to LOCKED
2022-09-02 09:37:30.669 [DEF  ] [S:57][Q:132]Set data(testdb.t1:normal) to LOCKED, event id is 2

gc_recovyer.log

2022-09-02 09:37:46.250 [INFO ] <RECOVER-INFO-1>: Start Recovering testdb.t1 tid 7
2022-09-02 09:37:46.250 [INFO ] <DATA-Recover-1>: Start dml recover testdb.t1,tid 7
2022-09-02 09:37:46.250 [INFO ] <DATA-Recover-1>: Start to dml recover of eventid(2)
2022-09-02 09:37:46.260 [INFO ] <GCWare>: GetSourceNodeForDMl for nodeId(1778515978), the node is unValid
2022-09-02 09:37:46.262 [INFO ] <DATA-Recover-1>: DoNormSqlNoResult, do(flush rollback `testdb`.`t1_n1`)
2022-09-02 09:37:46.264 [INFO ] <DATA-Recover-1>: recoverinfo gn dml: eventid:2 DB(testdb), TABLE(t1), SLICE(1), TID(7), src:0x1795293194(10.0.2.107) dst:0x1778515978(10.0.2.106)
2022-09-02 09:37:46.265 [INFO ] <CLockResource>: Lock: testdb start
2022-09-02 09:37:46.269 [INFO ] <CLockResource>: Lock: testdb end success
2022-09-02 09:37:46.269 [INFO ] <CLockResource>: Lock: testdb.t1.rsync start
2022-09-02 09:37:46.271 [INFO ] <CLockResource>: Lock: testdb.t1.rsync end success
2022-09-02 09:37:46.271 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA start
2022-09-02 09:37:46.273 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2022-09-02 09:37:46.273 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B start
2022-09-02 09:37:46.275 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2022-09-02 09:37:46.276 [INFO ] <DATA-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2022-09-02 09:37:46.276 [INFO ] <DATA-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 09:37:46.276 [INFO ] <GCWare>: testdb.t1 nodeid: 1778515978, segid: 1, tableid 7 have dmlstorageevent,total num 1,lasteventid: 2
2022-09-02 09:37:46.276 [INFO ] <DATA-Recover-1>: SYNC_SLICE, Find it is a meta broken table testdb.t1,and the event will be given up,event id 2
2022-09-02 09:37:46.290 [INFO ] <DATA-Recover-1>: SET_ONLINE, Succeeded in setting local node table 'testdb.t1_n1' online, eventinfo(eventid:2,segId:1)
2022-09-02 09:37:46.291 [INFO ] <CLockResource>: UnLock: testdb unlock success
2022-09-02 09:37:46.292 [INFO ] <CLockResource>: UnLock: testdb.t1.rsync unlock success
2022-09-02 09:37:46.294 [INFO ] <CLockResource>: UnLock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2022-09-02 09:37:46.295 [INFO ] <CLockResource>: UnLock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2022-09-02 09:37:46.296 [INFO ] <DATA-Recover-1>: End dml recover testdb.t1,tid 7
2022-09-02 09:37:46.296 [INFO ] <STORAGE-Recover-1>: Start DMLStorge recover testdb.t1,tid 7 eventnum 1
2022-09-02 09:37:46.297 [INFO ] <STORAGE-Recover-1>: Start to DMLStorge recover of eventid(2)
2022-09-02 09:37:46.297 [INFO ] <STORAGE-Recover-1>: recoverinfo gn(dmls): eventid=2, tablename=testdb.t1, slice:n1 src:0x1795293194(10.0.2.106) dst:0x1778515978(10.0.2.106)
2022-09-02 09:37:46.297 [INFO ] <STORAGE-Recover-1>: Revovering ..., for dmlstorage, forgcluster=false, dbname=testdb, tablename=t1, slice=n1, tid=7, eventid=2
2022-09-02 09:37:46.309 [INFO ] <CLockResource>: Lock: testdb start
2022-09-02 09:37:46.310 [INFO ] <CLockResource>: Lock: testdb end success
2022-09-02 09:37:46.310 [INFO ] <CLockResource>: Lock: testdb.t1.rsync start
2022-09-02 09:37:46.312 [INFO ] <CLockResource>: Lock: testdb.t1.rsync end success
2022-09-02 09:37:46.312 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA start
2022-09-02 09:37:46.313 [INFO ] <CLockResource>: Lock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2022-09-02 09:37:46.313 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B start
2022-09-02 09:37:46.315 [INFO ] <CLockResource>: Lock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2022-09-02 09:37:46.322 [INFO ] <STORAGE-Recover-1>: BackUpTable, meta broken table(t1_n1:7) has been deleted and backuped success
2022-09-02 09:37:46.355 [INFO ] <STORAGE-Recover-1>: CreateFailNodeField, do "CREATE TABLE /*+ distribution(1)*/ `testdb`.`t1_n1`(
  "id" int(11) DEFAULT NULL,
  "name" varchar(100) DEFAULT NULL
) ENGINE=EXPRESS TID(8) UID(1) DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' COLUMN_IDS(0, 1) TID(8)" success
2022-09-02 09:37:46.355 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CreateFailNodeField success
2022-09-02 09:37:46.355 [INFO ] <STORAGE-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2022-09-02 09:37:46.355 [INFO ] <STORAGE-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 09:37:46.355 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Syncing table slice{DB: 'testdb', Table: 't1', Slice: 'n1'} from sample gnode: '10.0.2.107'
2022-09-02 09:37:46.355 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync command line: 'start 'gc_sync_client' '10.0.2.107 testdb t1_n1 10.0.2.106 5258 3 1''
2022-09-02 09:37:46.934 [INFO ] <STORAGE-Recover-1>: check drop sql thread quit.
2022-09-02 09:37:46.935 [INFO ] <STORAGE-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '10.0.2.107 testdb t1_n1 10.0.2.106 5258 3 1'} returned with (0)
2022-09-02 09:37:46.935 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync client executed successfully
2022-09-02 09:37:46.935 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refreshing local gnode, DB: 'testdb', Table: 't1', slice: 'n1'
2022-09-02 09:37:46.935 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refresh query: 'refresh table `testdb`.`t1_n1`;'
2022-09-02 09:37:46.935 [INFO ] <STORAGE-Recover-1>: Sync cl
2022-09-02 09:37:46.935 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync end, return 0
2022-09-02 09:37:46.935 [INFO ] <CLockResource>: Lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F start
2022-09-02 09:37:46.937 [INFO ] <CLockResource>: Lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F end success
2022-09-02 09:37:46.937 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't1', slice: 'n1'}
2022-09-02 09:37:46.942 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, SCN of table slice of local GNode and sample GNode(number of working nodes: 2)
2022-09-02 09:37:46.942 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,        Local GNode: 18
2022-09-02 09:37:46.942 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,    Sample GNode[1]: 18
2022-09-02 09:37:46.942 [INFO ] <CLockResource>: UnLock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F unlock success
2022-09-02 09:37:46.943 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CallSync success
2022-09-02 09:37:46.944 [INFO ] <CLockResource>: UnLock: testdb unlock success
2022-09-02 09:37:46.945 [INFO ] <CLockResource>: UnLock: testdb.t1.rsync unlock success
2022-09-02 09:37:46.947 [INFO ] <CLockResource>: UnLock: testdb.t1824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2022-09-02 09:37:46.948 [INFO ] <CLockResource>: UnLock: testdb.t1D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2022-09-02 09:37:46.957 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGeoup success!
2022-09-02 09:37:46.957 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log start, eventId=2.............
2022-09-02 09:37:46.966 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log end, eventId=2.............
2022-09-02 09:37:46.967 [INFO ] <RECOVER-INFO-1>: Finishing Recovering testdb.t1,tid 7