南大通用GBase 8a的如何拿到行号,rowid,类似oracle的rowid

本文介绍GBase 8a数据库集群内部,对没一行数据的行号rowid的方案。如Oracle等数据库,提供了rowid来唯一标识一行数据,在GBase里也提供了rowid,但因为是MPP,所以在随机分布表里,每个节点是各自独立的,只有复制表才能保证一致性。

测试环境

2节点集群

[gbase@localhost ~]$ gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=================================================================
|             GBASE COORDINATOR CLUSTER INFORMATION             |
=================================================================
|   NodeName   |     IpAddress     |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 |    10.0.2.107     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
| coordinator2 |    10.0.2.106     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
=============================================================
|              GBASE DATA CLUSTER INFORMATION               |
=============================================================
|NodeName |     IpAddress     |gnode |syncserver |DataState |
-------------------------------------------------------------
|  node1  |    10.0.2.107     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
|  node2  |    10.0.2.106     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------

测试hash分布表的行号rowid

随机表的rowid, 会出现重复的,因为每个数据节点都是从0开始的rowid。

gbase> create table test_rowid(id int, value varchar(100)) distributed by('id');
Query OK, 0 rows affected (Elapsed: 00:00:00.26)

gbase> insert into test_rowid values(1,'111'),(2,'2222'),(3,'3333');
Query OK, 3 rows affected (Elapsed: 00:00:00.16)
Records: 3  Duplicates: 0  Warnings: 0

gbase> insert into test_rowid values(4,'444'),(5,'555'),(6,'666');
Query OK, 3 rows affected (Elapsed: 00:00:00.21)
Records: 3  Duplicates: 0  Warnings: 0

gbase> insert into test_rowid values(7,'777'),(8,'888'),(9,'999');
Query OK, 3 rows affected (Elapsed: 00:00:00.13)
Records: 3  Duplicates: 0  Warnings: 0

gbase> select rowid,t.* from test_rowid t;
+-------+------+-------+
| rowid | id   | value |
+-------+------+-------+
|     0 |    1 | 111   |
|     1 |    2 | 2222  |
|     2 |    3 | 3333  |
|     3 |    4 | 444   |
|     4 |    5 | 555   |
|     0 |    6 | 666   |
|     1 |    7 | 777   |
|     2 |    8 | 888   |
|     3 |    9 | 999   |
+-------+------+-------+
9 rows in set (Elapsed: 00:00:00.00)

gbase> ^CAborted

测试随机分布表行号rowid

依然会出现重复的行号rowid

[gbase@localhost ~]$ cat test_rowid.txt
11,1111111
22,22222222
33,333333333
44,44444444
55,555555555
66,666666666
77,7777777777
88,8888888888
[gbase@localhost ~]$

gbase> create table test_rowid_2(id int, value varchar(100));
Query OK, 0 rows affected (Elapsed: 00:00:00.27)

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107//home/gbase/test_rowid.txt' into table test_rowid_2 fields terminated by ',';
Query OK, 8 rows affected (Elapsed: 00:00:00.98)
Task 1835164 finished, Loaded 8 records, Skipped 0 records

gbase> select rowid,t.* from test_rowid_2 t;
+-------+------+------------+
| rowid | id   | value      |
+-------+------+------------+
|     0 |   11 | 1111111    |
|     1 |   33 | 333333333  |
|     2 |   55 | 555555555  |
|     3 |   77 | 7777777777 |
|     0 |   22 | 22222222   |
|     1 |   44 | 44444444   |
|     2 |   66 | 666666666  |
|     3 |   88 | 8888888888 |
+-------+------+------------+
8 rows in set (Elapsed: 00:00:00.01)


测试复制表行号rowid

因为复制表,每个节点数据都一样,所以行号rowid是唯一的。


gbase> create table test_rowid_r(id int, value varchar(100)) replicated;
Query OK, 0 rows affected (Elapsed: 00:00:00.17)

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107//home/gbase/test_rowid.txt' into table test_rowid_r fields terminated by ',';
Query OK, 8 rows affected (Elapsed: 00:00:00.90)
Task 1835166 finished, Loaded 8 records, Skipped 0 records

gbase> select rowid,t.* from test_rowid_r t;
+-------+------+------------+
| rowid | id   | value      |
+-------+------+------------+
|     0 |   11 | 1111111    |
|     1 |   22 | 22222222   |
|     2 |   33 | 333333333  |
|     3 |   44 | 44444444   |
|     4 |   55 | 555555555  |
|     5 |   66 | 666666666  |
|     6 |   77 | 7777777777 |
|     7 |   88 | 8888888888 |
+-------+------+------------+
8 rows in set (Elapsed: 00:00:00.01)

结论

在当前已经发行的版本,由于行号rowid是依赖数据节点本身的,所以集群层并没有统一的一个行号,所以如果需要行号,可以用复制表。

从业务上,还是希望用户自行管理类似功能,比如新的V9版本支持了自增列,不要依赖行号。