GBase 8a数据库数据不一致dmlevent故障模拟和恢复过程分析

GBase 8a数据库集群,通过副本来保证数据高可用,当某些服务或节点故障时,就会产生不一致,比如dmlevent。本文在测试环境模拟故障,并分析其恢复过程。

环境

3节点集群,关闭1个节点的数据库服务。V95版本。

[gbase@localhost gcluster]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.102 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.202 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator3 | 10.0.2.203 | CLOSE  |  CLOSE   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.102                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.202                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.203                |       3        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------

模拟故障,insert一些数据

[gbase@localhost gcluster]$ gccli testdb

GBase client 9.5.2.17.115980. Copyright (c) 2004-2020, GBase.  All Rights Reserved.

gbase> create table t12(id int) distributed by('id');
Query OK, 0 rows affected (Elapsed: 00:00:01.02)

gbase> insert into t12 values(1);
Query OK, 1 row affected (Elapsed: 00:00:00.12)

gbase> insert into t12 values(2);
Query OK, 1 row affected (Elapsed: 00:00:00.19)

gbase> insert into t12 values(3);
Query OK, 1 row affected (Elapsed: 00:00:00.33)

gbase> ^CAborted
[gbase@localhost gcluster]$ gcadmin showdmlevent
Vc event count:0
[gbase@localhost gcluster]$ gccli testdb

GBase client 9.5.2.17.115980. Copyright (c) 2004-2020, GBase.  All Rights Reserved.

gbase> insert into t12 values(4);
Query OK, 1 row affected (Elapsed: 00:00:00.61)

gbase> insert into t12 values(5);
Query OK, 1 row affected (Elapsed: 00:00:00.48)

gbase> insert into t12 values(6);
Query OK, 1 row affected (Elapsed: 00:00:00.15)

gbase> show create table t12;\
+-------+-----------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                            |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------+
| t12   | CREATE TABLE "t12" (
  "id" int(11) DEFAULT NULL
) ENGINE=EXPRESS DISTRIBUTED BY('id') DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (Elapsed: 00:00:00.00)

gbase> ^CAborted
[gbase@localhost gcluster]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.102 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.202 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator3 | 10.0.2.203 | CLOSE  |  CLOSE   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.102                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.202                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.203                |       3        | CLOSE |   CLOSE    |     1     |
---------------------------------------------------------------------------------------------------------

[gbase@localhost gcluster]$ gcadmin showdmlevent
Vc event count:1
Event ID:    5
ObjectName: testdb.t12

Fail Data Copy:
------------------------------------------------------
SegName: n2     SCN: 9222       NodeIP: 10.0.2.203      FAILURE
SegName: n3     SCN: 9224       NodeIP: 10.0.2.203      FAILURE


[gbase@localhost gcluster]$

如上可以看到,只有数据影响到了故障节点,才会被设置故障标记。

查看当前管理节点的express.log

关注设置event的部分,如下event id = 5,和event的信息一致。

2020-08-15 19:29:15.022 [DEF  ] [S:49][Q:60][insert] Set Node(10.0.2.203)testdb.t12[n2][scn:9222] data state
2020-08-15 19:29:15.076 [DEF  ] [S:49][Q:60]Successfully set data(testdb.t12:normal) LOCKED, event id is 5

2020-08-15 19:29:18.371 [DEF  ] [S:49][Q:61][insert] Set Node(10.0.2.203)testdb.t12[n2][scn:9223] data state
2020-08-15 19:29:18.399 [DEF  ] [S:49][Q:61]Successfully set data(testdb.t12:normal) LOCKED, event id is 5

2020-08-15 19:29:20.470 [DEF  ] [S:49][Q:62][insert] Set Node(10.0.2.203)testdb.t12[n3][scn:9224] data state
2020-08-15 19:29:20.510 [DEF  ] [S:49][Q:62]Successfully set data(testdb.t12:normal) LOCKED, event id is 5

恢复故障节点

[gbase@localhost gcluster]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.102 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.202 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator3 | 10.0.2.203 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.102                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.202                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.203                |       3        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

查看gc_recover.log,看看恢复调度信息

2020-08-15 19:35:13.750 [INFO ] <session:7>: Start dml recover .,tid 357, eventid 5
2020-08-15 19:35:13.763 [INFO ] <session:7>: source node is 0xca02000a, table t12, suffix n2
2020-08-15 19:35:13.849 [INFO ] <session:7>: DealDMLRecoverLock vc00001.testdb.t12 success
2020-08-15 19:35:13.858 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: recoverinfo gn dml: eventid:5 DB(testdb), TABLE(t12), SLICE(2), TID(357), src:3389128714(10.0.2.202) dst:3405905930(10.0.2.203)
2020-08-15 19:35:13.924 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: PreWriteDMLSBeforeSync eventid 6
2020-08-15 19:35:13.924 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.102,port 5258, do "select table_id from information_schema.tables where table_schema='testdb' and table_name='t12' and TABLE_VC='vcname000001'"
2020-08-15 19:35:13.926 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:357
2020-08-15 19:35:13.926 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "select scn from information_schema.tables where table_schema='testdb' and table_name='t12_n2'"
2020-08-15 19:35:13.928 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:9220
2020-08-15 19:35:13.928 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "FLUSH ROLLBACK "testdb"."t12_n2""
2020-08-15 19:35:13.929 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "flush transaction_log"
2020-08-15 19:35:13.929 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.202,port 5050, do "flush transaction_log"
2020-08-15 19:35:13.930 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "start 'gc_sync_client' '10.0.2.202 vcname000001 testdb t12_n2 10.0.2.102 5258 2 0'"
2020-08-15 19:35:14.206 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: ExecSyncQuery, sync table returned with (0)
2020-08-15 19:35:14.206 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: check drop sql thread quit.
2020-08-15 19:35:14.206 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "flush transaction_log"
2020-08-15 19:35:14.207 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.202,port 5050, do "flush transaction_log"
2020-08-15 19:35:14.207 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: CALL_SYNC, Sync client executed successfully
2020-08-15 19:35:14.207 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "refresh table "testdb"."t12_n2""
2020-08-15 19:35:14.208 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: Sync client returned errcode:[0],errmsg:[success]
2020-08-15 19:35:14.208 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: CALL_SYNC, Sync end, return 0
2020-08-15 19:35:14.212 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't12', slice: 'n2'}
2020-08-15 19:35:14.212 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "select scn from information_schema.tables where table_schema='testdb' and table_name='t12_n2'"
2020-08-15 19:35:14.213 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:9223
2020-08-15 19:35:14.213 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.202,port 5050, do "refresh table "testdb"."t12_n2""
2020-08-15 19:35:14.213 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.202,port 5050, do "select scn from information_schema.tables where table_schema='testdb' and table_name='t12_n2'"
2020-08-15 19:35:14.214 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:9223
2020-08-15 19:35:14.214 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: CHECK_SYNC_RESULT,        recover GNode: 9223
2020-08-15 19:35:14.214 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: CHECK_SYNC_RESULT,      source GNode[1]: 9223
2020-08-15 19:35:14.214 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: delete dml event, table testdb.t12,event id 5 suffix n2
2020-08-15 19:35:14.265 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: SET_ONLINE, Succeeded in setting local node table 'testdb.t12_n2' online, eventinfo(eventid:5,segId:2)
2020-08-15 19:35:14.265 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: CheckToClearOrKeepPreWriteDMLS delete eventid 6
2020-08-15 19:35:14.279 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: delete dmlstorage event, event id: 6
2020-08-15 19:35:14.312 [INFO ] <session:7>: DealDMLRecoverUnLock vc00001.testdb.t12
2020-08-15 19:35:14.524 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n2,et:DML,eid:5,en:node,ip:10.0.2.203>: GetSourceNodeForDMl for nodeId(3405905930), the node is inValid
2020-08-15 19:35:14.524 [INFO ] <session:7>: source node is 0x6602000a, table t12, suffix n3
2020-08-15 19:35:14.554 [INFO ] <session:7>: DealDMLRecoverLock vc00001.testdb.t12 success
2020-08-15 19:35:14.561 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: recoverinfo gn dml: eventid:5 DB(testdb), TABLE(t12), SLICE(3), TID(357), src:1711407114(10.0.2.102) dst:3405905930(10.0.2.203)
2020-08-15 19:35:14.566 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: PreWriteDMLSBeforeSync eventid 7
2020-08-15 19:35:14.566 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.102,port 5258, do "select table_id from information_schema.tables where table_schema='testdb' and table_name='t12' and TABLE_VC='vcname000001'"
2020-08-15 19:35:14.567 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:357
2020-08-15 19:35:14.567 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "select scn from information_schema.tables where table_schema='testdb' and table_name='t12_n3'"
2020-08-15 19:35:14.569 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:9221
2020-08-15 19:35:14.569 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "FLUSH ROLLBACK "testdb"."t12_n3""
2020-08-15 19:35:14.570 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "flush transaction_log"
2020-08-15 19:35:14.570 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.102,port 5050, do "flush transaction_log"
2020-08-15 19:35:14.571 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "start 'gc_sync_client' '10.0.2.102 vcname000001 testdb t12_n3 10.0.2.102 5258 2 0'"
2020-08-15 19:35:14.921 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: ExecSyncQuery, sync table returned with (0)
2020-08-15 19:35:14.921 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: check drop sql thread quit.
2020-08-15 19:35:14.922 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "flush transaction_log"
2020-08-15 19:35:14.922 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.102,port 5050, do "flush transaction_log"
2020-08-15 19:35:14.923 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: CALL_SYNC, Sync client executed successfully
2020-08-15 19:35:14.923 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "refresh table "testdb"."t12_n3""
2020-08-15 19:35:14.924 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: Sync client returned errcode:[0],errmsg:[success]
2020-08-15 19:35:14.924 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: CALL_SYNC, Sync end, return 0
2020-08-15 19:35:14.929 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't12', slice: 'n3'}
2020-08-15 19:35:14.929 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.203,port 5050, do "select scn from information_schema.tables where table_schema='testdb' and table_name='t12_n3'"
2020-08-15 19:35:14.930 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:9224
2020-08-15 19:35:14.930 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.102,port 5050, do "refresh table "testdb"."t12_n3""
2020-08-15 19:35:14.931 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: QueryExecute ip 10.0.2.102,port 5050, do "select scn from information_schema.tables where table_schema='testdb' and table_name='t12_n3'"
2020-08-15 19:35:14.932 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: MatchJustOneResult:9224
2020-08-15 19:35:14.932 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: CHECK_SYNC_RESULT,        recover GNode: 9224
2020-08-15 19:35:14.932 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: CHECK_SYNC_RESULT,      source GNode[1]: 9224
2020-08-15 19:35:14.932 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: delete dml event, table testdb.t12,event id 5 suffix n3
2020-08-15 19:35:14.952 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: SET_ONLINE, Succeeded in setting local node table 'testdb.t12_n3' online, eventinfo(eventid:5,segId:3)
2020-08-15 19:35:14.952 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: CheckToClearOrKeepPreWriteDMLS delete eventid 7
2020-08-15 19:35:14.962 [INFO ] <session:7, tid:357, db:testdb,tb:t12,nx:n3,et:DML,eid:5,en:node,ip:10.0.2.203>: delete dmlstorage event, event id: 7
2020-08-15 19:35:14.973 [INFO ] <session:7>: DealDMLRecoverUnLock vc00001.testdb.t12
2020-08-15 19:35:15.175 [INFO ] <session:7>: End dml recover .,tid 357

如上能看到调用了 sync_client程序进行恢复,然后调用了refresh刷新,最后设置节点状态为online。
分别处理了n2和n3分片

查看同步日志

在故障节点gnode的日志目录下,也是n2和n3两个日志,如下只贴一个。

[gbase@localhost gbase]$ cat syncclient_vcname000001_testdb_t12_n2_2020-08-15.log
2020-08-15 19:35:13.494 6377 INFO: read from config file:
2020-08-15 19:35:13.494 6377 INFO: log path is /opt/gbase/gnode/log/gbase
2020-08-15 19:35:13.494 6377 INFO: log level is 3
2020-08-15 19:35:13.494 6377 INFO: sync level is 1
2020-08-15 19:35:13.494 6377 INFO: server parallel is 4
2020-08-15 19:35:13.494 6377 INFO: server port is 5288
2020-08-15 19:35:13.494 6377 INFO: block size is 512
2020-08-15 19:35:13.494 6377 INFO: read from parameter:
2020-08-15 19:35:13.494 6377 INFO: server ip address is 10.0.2.202
2020-08-15 19:35:13.494 6377 INFO: table vc is vcname000001
2020-08-15 19:35:13.494 6377 INFO: sync database name is testdb
2020-08-15 19:35:13.494 6377 INFO: sync table name is t12_n2
2020-08-15 19:35:13.494 6377 INFO: gcluster ip address is 10.0.2.102
2020-08-15 19:35:13.494 6377 INFO: gcluster port is 5258
2020-08-15 19:35:13.494 6377 INFO: check method is 2
2020-08-15 19:35:13.494 6377 INFO: gcluster lock is 0
2020-08-15 19:35:13.494 6377 INFO: gnode lock is 1
2020-08-15 19:35:13.494 6377 INFO: double check is 0
2020-08-15 19:35:13.494 6377 INFO: truncate space is 0
2020-08-15 19:35:13.494 6377 INFO: TaskID is
2020-08-15 19:35:13.495 6377 INFO: Socket.cpp:45 10.0.2.202 is not a valid ipv6 address!
2020-08-15 19:35:13.495 6377 INFO: Start Sync Table vcname000001.testdb.t12_n2 from 10.0.2.202
2020-08-15 19:35:13.495 6377 INFO: =============InitTableInfo=================
2020-08-15 19:35:13.512 6377 CRITICAL: TableInfo.cpp:2526 can't find "default-character-set" key in config file
2020-08-15 19:35:13.512 6377 WARNING: TableInfo.cpp:645 can't get charset from config file /opt/gbase/gnode/config/gbase_8a_gbase.cnf.
2020-08-15 19:35:13.513 6377 INFO: =============Sync Table Info And Table Struct=================
2020-08-15 19:35:13.520 6377 INFO: Connect to gnode server ok

2020-08-15 19:35:13.520 6377 INFO: TableInfo.cpp:2116 GN Lock table testdb.t12_n2
2020-08-15 19:35:13.520 6377 INFO: Lock table testdb.t12_n2 start
2020-08-15 19:35:13.520 6377 INFO: TableInfo.cpp:2182 Lock table testdb.t12_n2 successfully
2020-08-15 19:35:13.520 6377 WARNING: FileOperation.cpp:228 access file /opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.par failure! errno:2 errmsg:No such file or directory
2020-08-15 19:35:13.520 6377 WARNING: TableInfo.cpp:2552 file /opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.par not exist.
2020-08-15 19:35:13.520 6377 WARNING: TableInfo.cpp:1502 No par file /opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.par
2020-08-15 19:35:13.520 6377 INFO: Local Table t12_n2 DataVersion:1

2020-08-15 19:35:13.520 6377 INFO: Local Table t12_n2 State:1

2020-08-15 19:35:13.520 6377 INFO: Local Table t12_n2 Des:0

2020-08-15 19:35:13.520 6377 INFO: Local Table t12_n2 Del:1

2020-08-15 19:35:13.520 6377 INFO: Local Table t12_n2 Ctl:1

2020-08-15 19:35:13.520 6377 INFO: Table t12_n2 max scn is  9220
2020-08-15 19:35:13.520 6377 WARNING: FileOperation.cpp:228 access file /opt/gbase/gnode/userdata/gbase/testdb/tablespace.cnf failure! errno:2 errmsg:No such file or directory
2020-08-15 19:35:13.520 6377 INFO: TableInfo.cpp:2790 tablespace.cnf file not exist. path: /opt/gbase/gnode/userdata/gbase/testdb/tablespace.cnf
2020-08-15 19:35:13.520 6377 INFO: TableInfo.cpp:1154 col 0 m_CurrentReadLoc = 0 m_CurrentReadVersion=0
2020-08-15 19:35:13.521 6377 INFO:
===  table information  ===
CREATE TABLE "t12_n2" (
  "id" int(11) DEFAULT NULL
) ENGINE=EXPRESS DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' COLUMN_IDS(0)
===========================
2020-08-15 19:35:13.521 6377 INFO: TableInfo.cpp:2202 GN Unlock table testdb.t12_n2
2020-08-15 19:35:13.522 6377 INFO: TableInfo.cpp:2240 Unlock table testdb.t12_n2 successfully
2020-08-15 19:35:13.526 6377 INFO: Server Table t12_n2 DataVersion:1

2020-08-15 19:35:13.526 6377 INFO: Server Table t12_n2 Info:1

2020-08-15 19:35:13.526 6377 INFO: Server Table t12_n2 Map:0

2020-08-15 19:35:13.526 6377 INFO: Server Table t12_n2 Del:1

2020-08-15 19:35:13.526 6377 INFO: Server Table t12_n2 Ctl:1

2020-08-15 19:35:13.526 6377 INFO: =============Sync Table=================
2020-08-15 19:35:13.537 6377 CRITICAL: TableSyncClient.cpp:1551 normal table, des or frm file not exist.
2020-08-15 19:35:13.537 6377 WARNING: FileOperation.cpp:228 access file /opt/gbase/gnode/userdata/gbase/testdb/tablespace.cnf failure! errno:2 errmsg:No such file or directory
2020-08-15 19:35:13.537 6377 INFO: SyncTable column file NoCols: 1
2020-08-15 19:35:13.537 6377 INFO: Sync Col DC Diff Info 1
2020-08-15 19:35:13.537 6377 INFO: ============Sync column DC Diff start. column:1============
2020-08-15 19:35:13.537 6377 INFO: ============Sync Col Mount Info============
2020-08-15 19:35:13.537 6377 INFO: ============Sync map file content============
2020-08-15 19:35:13.537 6377 INFO: ============Sync Column different DCInfo============
2020-08-15 19:35:13.537 6377 INFO: TableSyncClient.cpp:1859 Diff DC number is 1. first different DC is 0.
2020-08-15 19:35:13.537 6377 INFO: SyncColFile 1
2020-08-15 19:35:13.537 6377 INFO: ============Sync column file start. column:1============
2020-08-15 19:35:13.537 6377 INFO: ============Sync column file end. column::1============
2020-08-15 19:35:13.537 6377 INFO: RecvType
2020-08-15 19:35:13.607 6377 INFO: Calculate Number of RecvDC
2020-08-15 19:35:13.607 6377 INFO: RecvColDCFile
2020-08-15 19:35:13.688 6377 INFO: SyncTableFile
2020-08-15 19:35:13.688 6377 INFO: TableSyncClient.cpp:1563 normal table, par file not exist.
2020-08-15 19:35:13.688 6377 WARNING: FileOperation.cpp:228 access file /opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.GED/table.delete.B failure! errno:2 errmsg:No such file or directory
2020-08-15 19:35:13.688 6377 INFO: TableSyncClient.cpp:1617 partition table, logic table, table.delete file not exist.
2020-08-15 19:35:13.689 6377 INFO: TableSyncClient.cpp:2781 File sync:/opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.GED/table.info.B.
2020-08-15 19:35:13.713 6377 INFO: TableSyncClient.cpp:2781 File sync:/opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.GED/table.map.A.
2020-08-15 19:35:13.730 6377 INFO: TableSyncClient.cpp:2781 File sync:/opt/gbase/gnode/userdata/gbase/testdb/metadata/t12_n2.frm.
2020-08-15 19:35:13.741 6377 INFO: TableSyncClient.cpp:2781 File sync:/opt/gbase/gnode/userdata/gbase/testdb/sys_tablespace/t12_n2/C00000.seg.
2020-08-15 19:35:13.767 6377 INFO: =============Sync Table vcname000001.testdb.t12_n2 Success=================