南大通用GBase 8a 加载LOAD数据无法入库错误 skipped records,The number of data columns is less than the number of columns defined

GBase 8a 在加载LOAD时,如果发现数据和目标表不匹配,会将错误数据和原因,写入到日志里。 其中The number of data columns is less than the number of columns defined 错误是因为文件列数少于表的指定列数。数据按照分隔符切分的字段数量和表或指定的字段数量不匹配导致。

错误原因日志是trc结尾的,错误数据是err结尾的。

如下是测试用例

表结构

测试样例

一个2列的表

gbase> create table tload_2(id int, name varchar(200));
Query OK, 0 rows affected (Elapsed: 00:00:00.13)

gbase> desc tload_2;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(200) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
2 rows in set (Elapsed: 00:00:00.00)

样例数据

制造了异常数据,有字段少,也有字段多

[gbase@localhost ~]$ cat /home/gbase/tload.txt
1
2,Second
3,Third,2020-01-02
4,Fouth,2020-14-16
[gbase@localhost ~]$

执行加载

有1行正常入库,3行错误数据

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107/home/gbase/tload.txt' into table tload_2;
Query OK, 1 row affected (Elapsed: 00:00:00.68)
Task 1572889 finished, Loaded 1 records, Skipped 3 records

查看表数据

确实有1行

gbase> select * from tload_2;
+------+--------+
| id   | name   |
+------+--------+
|    2 | Second |
+------+--------+
1 row in set (Elapsed: 00:00:00.00)

查看错误日志和数据

此处为了方便,直接用SQL方式读取。可以看到报错信息 The number of data columns is less than the number of columns defined. 还有text parser error。注意用SQL方式,可以从结果里直接看到错误数据。如果从文件查看,是分成2个文件保存的。

gbase> show load logs limit 3;
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+------------------------------------------------------------------------+---------------------+
| task_id | db_name | tb_name | err_data_ip       | file_name                                              | file_offset | record_len | err_column | err_reason                                                             | err_data            |
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+------------------------------------------------------------------------+---------------------+
| 1572889 | testdb  | tload_2 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 0           | 2          | 1          | The number of data columns is less than the number of columns defined
 | 1
                  |
| 1572889 | testdb  | tload_2 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 11          | 19         | 3          | text parser error
                                                     | 3,Third,2020-01-02
 |
| 1572889 | testdb  | tload_2 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 30          | 19         | 3          | text parser error
                                                     | 4,Fouth,2020-14-16
 |
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+------------------------------------------------------------------------+---------------------+

通过文件查看错误日志和数据

[gbase@localhost 1572883]$ cd /opt/gcluster/log/gcluster/loader_logs/1572889
[gbase@localhost 1572889]$ ll
total 12
-rw------- 1 gbase gbase 220 Sep  3 14:16 1572889_loader_result.log
-rw------- 1 gbase gbase  40 Sep  3 14:16 1572889_testdb_tload_2_n1_::ffff:10.0.2.107_20200903141603.err
-rw------- 1 gbase gbase 368 Sep  3 14:16 1572889_testdb_tload_2_n1_::ffff:10.0.2.107_20200903141603.trc
[gbase@localhost 1572889]$ cat 1572889_testdb_tload_2_n1_\:\:ffff\:10.0.2.107_20200903141603.trc
file_name    |   file_offset    |   record_len   |   column    |    reason
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|0|2|1|The number of data columns is less than the number of columns defined
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|11|19|3|text parser error
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|30|19|3|text parser error
[gbase@localhost 1572889]$
[gbase@localhost 1572889]$ cat 1572889_testdb_tload_2_n1_\:\:ffff\:10.0.2.107_20200903141603.err
1
3,Third,2020-01-02
4,Fouth,2020-14-16

解决方案

对于数据字段比表字段多的情况,可以通过table_fields指定来规避。请参考

GBase 8a 集群加载数据LOAD数据和字段不匹配的处理方案

其它加载错误,请参考

GBase 8a数据库加载LOAD报错信息分析和解决文章汇总

南大通用GBase 8a 加载LOAD数据无法入库错误 skipped records,The number of data columns is less than the number of columns defined》有1条评论

评论已关闭。