GBase 8a 通过缩容剔除无法修复的故障节点操作记录

如某个节点出现永久性故障,不可修复,且剩余节点也足以支撑现有业务,GBase 8a 可以通过缩容,重建集群主备关系,来剔除故障节点。本文通过一个实际例子介绍操作过程。

本文故障节点,指数据计算节点。 强烈建议管理,调度和计算节点分别部署,避免混用,除非节点少,成本优先。

环境

3节点集群,其中115节点故障。本此操作,不仅将故障的115缩容,顺便将102节点也缩容。

数据库为9.5.2

[gbase@gbase_rh7_001 ~]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       7        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.102                |       7        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.115                |       7        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------

缩容操作

与普通缩容过程完全一样,主要为了证明在节点故障时,也是可以缩容的。

创建不包含故障节点,以及计划缩容节点的分布策略

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ cat gcChangeInfo_one.xml
<?xml version="1.0" encoding="utf-8"?>
<servers>
 <rack>
  <node ip="10.0.2.101"/>
 </rack>
</servers>

创建新的策略

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin distribution gcChangeInfo_one.xml p 1 d 0
gcadmin generate distribution ...

[warning]: parameter [d num] is 0, the new distribution will has no segment backup
please ensure this is ok, input [Y,y] or [N,n]: y
gcadmin generate distribution successful

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ 

初始化和重分布

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gccli

GBase client 9.5.2.44.1045e3118. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> initnodedatamap;
Query OK, 0 rows affected, 3 warnings (Elapsed: 00:00:00.53)

gbase> rebalance instance;
Query OK, 11 rows affected (Elapsed: 00:00:00.74)

等待重分布结束

清理环境

删除nodedatamap

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gccli

GBase client 9.5.2.44.1045e3118. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> refreshnodedatamap drop 7;
Query OK, 0 rows affected, 3 warnings (Elapsed: 00:00:00.62)

gbase> ^CAborted
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$

清理event,因为重分布时,会导致故障节点出现ddl/dml的event.

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmdmlevent 2 10.0.2.115
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmddlevent 2 10.0.2.115

删除分布策略

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmdistribution 7
cluster distribution ID [7]
it will be removed now
please ensure this is ok, input [Y,y] or [N,n]: y
gcadmin remove distribution [7] success
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       8        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.102                |                | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.115                |                | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$

移除缩容的节点

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ cat rmnodes.xml
<?xml version="1.0" encoding="utf-8"?>
<servers>
 <rack>
  <node ip="10.0.2.102"/>
  <node ip="10.0.2.115"/>
 </rack>
</servers>
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmnodes rmnodes.xml
gcadmin remove nodes ...


gcadmin rmnodes from cluster success

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       8        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$

删除缩容节点的数据库文件

rm -fr /opt/gbase/gcluster
rm -fr /opt/gbase/gnode
rm -fr /opt/gbase/gcware

总结

当节点彻底不可用时,GBase 8a集群是支持将该节点强制缩容剔除出集群的。与正常缩容的区别,就是缩容重分布过程会在故障节点产生event,再删除分布策略时要先清理掉。