GBase 8a在执行时。为避免并发冲突保证一致性,会持有一些锁来保证自己需要的资源在执行期间不会出现问题。锁在SQL执行完毕后会自动释放掉。在某些特殊场景下,特别是一些老版本集群,出现需要强行释放掉锁的需求,比如SQL长时间无法结束,而该SQL持有的锁又导致其它的SQL无法正常运行,同时环境又不能重启节点服务时,可以考虑本文的方法强行释放SQL持有的锁。
对于DDL,DML类的SQL,一般要配合failover清理:GBase 8a查看和清理故障恢复状态Failover的方法
目录导航
参考
GBase8a MPP Cluster查看集群锁 showlock
GBase 8a查看和清理故障恢复状态Failover的方法
构造锁的测试场景
如下通过lock table语句构造了一个持有锁的SQL场景,连接节点为201。
集群版本:8.6.2.43-R33.129391
[root@rh6-1 ~]# gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL
=================================================================
|             GBASE COORDINATOR CLUSTER INFORMATION             |
=================================================================
|   NodeName   |     IpAddress     |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 |    10.0.2.201     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
=============================================================
|              GBASE DATA CLUSTER INFORMATION               |
=============================================================
|NodeName |     IpAddress     |gnode |syncserver |DataState |
-------------------------------------------------------------
|  node1  |    10.0.2.201     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
|  node2  |    10.0.2.202     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
gbase> lock table t1 write;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
gbase> show processlist;
+-----+-----------------+-----------+--------+---------+-------+-----------------------------+------------------+
| Id  | User            | Host      | db     | Command | Time  | State                       | Info             |
+-----+-----------------+-----------+--------+---------+-------+-----------------------------+------------------+
|   1 | event_scheduler | localhost | NULL   | Daemon  | 42262 | Waiting for next activation | NULL             |
| 258 | root            | localhost | testdb | Query   |     0 | NULL                        | show processlist |
+-----+-----------------+-----------+--------+---------+-------+-----------------------------+------------------+
2 rows in set (Elapsed: 00:00:00.00)
查看SQL持有的锁
通过SQL查看锁
可以通过show detail processlist查看SQL当前持有的锁,需要的锁等信息。详情请参考
GBase8a 显示集群正在跑的SQL进程show [full | detail] processlist
可以看到ID=258的连接持有的3个锁,用分号分割:testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA
gbase> show detail processlist;
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
| Id  | Tid   | User            | Host      | db     | Command | Time  | State                       | Lock                                                           | Wait | Info                    |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
|   1 | 18648 | event_scheduler | localhost | NULL   | Daemon  | 42525 | Waiting for next activation | NULL                                                           | NULL | NULL                    |
| 259 |  9985 | root            | localhost | testdb | Query   |     0 | NULL                        | testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA | NULL | show detail processlist |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
2 rows in set (Elapsed: 00:00:00.00)
通过gcadmin查看锁
通过命令行的gcadmin showlock查看锁,详情请参考
GBase8a MPP Cluster查看集群锁 showlock
[root@rh6-1 ~]# gcadmin showlock
 +===================================================================================================================+
 |                                                   GCLUSTER LOCK                                                   |
 +===================================================================================================================+
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |                  Lock name                  |  owner   |            content            | create time  |orphan|type|
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |                   testdb                    |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa|10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | E  |
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |                  testdb.t1                  |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 Total : 3
[root@rh6-1 ~]#
通过python查看锁
通过python接口,从gcware.getlocks拿到锁。前面gcadmin showlock的信息就是从这里获得然后格式化输出的。请关注其中的
name:锁名字
lockid:锁ID,这个在showlock里没有显示出来。
[gbase@rh6-1 ubas]$ python
Python 2.6.6 (r266:84292, Oct 12 2012, 14:23:48)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gcware
>>> help('gcware')
>>> gcware.getlocks()
({'create time': '20220223090419', 'name': 'testdb', 'lockid': 1612718352306798592, 'orphan': 'FALSE', 'content': 'LOCK_Lock_Tables: 259(LWP:9985)', 'owner': '10.0.2.201', 'type': 'SHARE'}, {'create time': '20220223090419', 'name': 'testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa', 'lockid': 8587815309107265537, 'orphan': 'FALSE', 'content': 'LOCK_Lock_Tables: 259(LWP:9985)', 'owner': '10.0.2.201', 'type': 'EXCLUSIVE'}, {'create time': '20220223090419', 'name': 'testdb.t1', 'lockid': 7728385150517510147, 'orphan': 'FALSE', 'content': 'LOCK_Lock_Tables: 259(LWP:9985)', 'owner': '10.0.2.201', 'type': 'SHARE'})
>>>
根据锁名字,从里面找到对应的lockid, 用于清理锁的参数。先找工具格式化一下,方便观看
{
	'create time': '20220223090419',
	'name': 'testdb',
	'lockid': 1612718352306798592,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'SHARE'
}, {
	'create time': '20220223090419',
	'name': 'testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa',
	'lockid': 8587815309107265537,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'EXCLUSIVE'
}, {
	'create time': '20220223090419',
	'name': 'testdb.t1',
	'lockid': 7728385150517510147,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'SHARE'
}
通过python清理锁的方法
我们尝试释放一个排它锁
	'create time': '20220223090419',
	'name': 'testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa',
	'lockid': 8587815309107265537,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'EXCLUSIVE'
接口
gcware.unlocktableforce(name,lockid)
执行效果
>>> gcware.unlocktableforce('testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa',8587815309107265537);
0
>>> 
查看锁确实少了一个
[root@rh6-1 ~]# gcadmin showlock
 +===============================================================================+
 |                                 GCLUSTER LOCK                                 |
 +===============================================================================+
 +---------+----------+-------------------------------+--------------+------+----+
 |Lock name|  owner   |            content            | create time  |orphan|type|
 +---------+----------+-------------------------------+--------------+------+----+
 | testdb  |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------+----------+-------------------------------+--------------+------+----+
 |testdb.t1|10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------+----------+-------------------------------+--------------+------+----+
 Total : 2
再开一个连接,尝试拿到锁。发现可以正常拿到Lock table write, 但show detail processlist的持有锁信息并没有刷新,怀疑....
[gbase@rh6-1 ubas]$ gccli testdb
GBase client 8.6.2.43-R33.129391. Copyright (c) 2004-2022, GBase.  All Rights Reserved.
gbase> lock table t1 write;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
gbase> show detail processlist;
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
| Id  | Tid   | User            | Host      | db     | Command | Time  | State                       | Lock                                                           | Wait | Info                    |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
|   1 | 18648 | event_scheduler | localhost | NULL   | Daemon  | 46724 | Waiting for next activation | NULL                                                           | NULL | NULL                    |
| 259 |  9985 | root            | localhost | testdb | Sleep   |    74 |                             | testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA | NULL | NULL                    |
| 267 | 10695 | root            | localhost | testdb | Query   |     0 | NULL                        | testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA | NULL | show detail processlist |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
3 rows in set (Elapsed: 00:00:00.00)
如果锁没有释放,会一直卡在这里
gbase> lock table t1 write;
^CQuery aborted by Ctrl+C
ERROR 1100 (HY000): Table 'testdb.t1' was not locked with LOCK TABLES
gbase>
查看实际的锁情况,已经更新
[root@rh6-1 ~]# gcadmin showlock
 +====================================================================================================================+
 |                                                   GCLUSTER LOCK                                                    |
 +====================================================================================================================+
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                  Lock name                  |  owner   |            content             | create time  |orphan|type|
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                   testdb                    |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985) |20220223090419|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                   testdb                    |10.0.2.201|LOCK_Lock_Tables: 267(LWP:10695)|20220223101414|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa|10.0.2.201|LOCK_Lock_Tables: 267(LWP:10695)|20220223101414|FALSE | E  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                  testdb.t1                  |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985) |20220223090419|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                  testdb.t1                  |10.0.2.201|LOCK_Lock_Tables: 267(LWP:10695)|20220223101414|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 Total : 5
总结
SQL持有的锁是为了一致性,强行清理锁可能导致不可预期的结果,比如数据错误,所以只有在没有其它方法解决,且能通过其它手段确保数据一致性的情况下(比如手工检查和同步数据),才能强行清理锁。
请慎重。