从Clickhouse执行计划看跳数/稀疏索引的效果

Cliclhouse支持跳数索引(Data Skipping Indexes),本文从执行计划上查找跳数/稀疏索引对执行计划和性能的影响。

环境

普通表

CREATE TABLE testdb.t1
(
    `id` Int32,
    `name` String,
    `birth` Date
)
ENGINE = MergeTree
ORDER BY id

带跳数/稀疏索引的表

本例子中,用名字的长度,length(name),做索引。

CREATE TABLE testdb.tt1
(
    `id` Int32,
    `name` String,
    `birth` Date,
    INDEX a length(name) TYPE set(1000) GRANULARITY 4
)
ENGINE = MergeTree
ORDER BY id

数据量

都是6710万行(67108864)。从相同的表,insert select 复制过来的。

localhost :) select count() from t1;

SELECT count()
FROM t1

Query id: f84c9c2a-a93c-4eaa-b3cd-a5edd171be0b

┌──count()─┐
│ 67108864 │
└──────────┘

1 rows in set. Elapsed: 0.003 sec.

localhost :) select count() from tt1;

SELECT count()
FROM tt1

Query id: d82f42cd-56f0-4697-b2ea-89c743d75b78

┌──count()─┐
│ 67108864 │
└──────────┘

1 rows in set. Elapsed: 0.003 sec.

localhost :)

查看Clickhouse的执行计划

注意其中的数据读取位置,分别是从原始的存储(ReadFromStorage),和预处理的源(ReadFromPreparedSource)。

无索引表的执行计划

localhost :) explain select * from t1 where length(name)=10;

EXPLAIN
SELECT *
FROM t1
WHERE length(name) = 10

Query id: a9431920-a102-4e98-bb63-1bcf27d1cfe4

┌─explain───────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                               │
│   SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│     ReadFromStorage (MergeTree)                                           │
└───────────────────────────────────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.004 sec.

localhost :)

带跳数/稀疏索引的执行计划

localhost :) explain select * from tt1 where length(name)=10;

EXPLAIN
SELECT *
FROM tt1
WHERE length(name) = 10

Query id: cbe2695b-7f0a-4c59-b4f4-bd69eba72dab

┌─explain───────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                               │
│   SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│     ReadFromPreparedSource (Read from NullSource)                         │
└───────────────────────────────────────────────────────────────────────────┘

3 rows in set. Elapsed: 0.004 sec.

localhost :)

执行性能结果对比

命中数量少时,索引效果明显

6700万只命中1行,性能差距一个数量级以上。

localhost :) select count(*) from t1 where length(name)=8;

SELECT count(*)
FROM t1
WHERE length(name) = 8

Query id: 46c953f0-5b52-4162-a030-c628b0261e2e

┌─count()─┐
│       1 │
└─────────┘

1 rows in set. Elapsed: 0.458 sec. Processed 67.11 million rows, 939.52 MB (146.47 million rows/s., 2.05 GB/s.)

localhost :) select count(*) from tt1 where length(name)=8;

SELECT count(*)
FROM tt1
WHERE length(name) = 8

Query id: ecbba9cc-a728-4194-b814-9ffe6dffdcc7

┌─count()─┐
│       1 │
└─────────┘

1 rows in set. Elapsed: 0.013 sec.

localhost :)

命中数量少时执行计划

localhost :) explain select count(*) from t1 where length(name)=6;

EXPLAIN
SELECT count(*)
FROM t1
WHERE length(name) = 6

Query id: 3e40c992-cb35-427c-a666-cdec4497d430

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                     │
│   Aggregating                                                                   │
│     Expression (Before GROUP BY)                                                │
│       Filter (WHERE)                                                            │
│         SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│           ReadFromStorage (MergeTree)                                           │
└─────────────────────────────────────────────────────────────────────────────────┘

6 rows in set. Elapsed: 0.013 sec.

localhost :) explain select count(*) from tt1 where length(name)=6;

EXPLAIN
SELECT count(*)
FROM tt1
WHERE length(name) = 6

Query id: 3ef5e2fd-d210-4549-8cb4-b8db899f0a8d

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                     │
│   Aggregating                                                                   │
│     Expression (Before GROUP BY)                                                │
│       Filter (WHERE)                                                            │
│         SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│           ReadFromPreparedSource (Read from NullSource)                         │
└─────────────────────────────────────────────────────────────────────────────────┘

6 rows in set. Elapsed: 0.014 sec.

localhost :)

命中大量结果,无实质影响

因为评估的结果,命中大量数据,不再走索引,后面的执行计划也能看出差距。

localhost :) select count(*) from t1 where length(name)=5;

SELECT count(*)
FROM t1
WHERE length(name) = 5

Query id: b72a577a-6c1b-4fa5-aaea-d65a7900b14e

┌──count()─┐
│ 67108864 │
└──────────┘

1 rows in set. Elapsed: 0.447 sec. Processed 67.11 million rows, 939.52 MB (150.24 million rows/s., 2.10 GB/s.)

localhost :) select count(*) from tt1 where length(name)=5;

SELECT count(*)
FROM tt1
WHERE length(name) = 5

Query id: 984daf4c-f674-417b-b251-e53b31ba01e8

┌──count()─┐
│ 67108864 │
└──────────┘

1 rows in set. Elapsed: 0.454 sec. Processed 67.11 million rows, 939.52 MB (147.71 million rows/s., 2.07 GB/s.)

localhost :)

命中大量结果执行计划

完全一样的执行计划,都是从存储读取了。

localhost :) explain select count(*) from t1 where length(name)=5;

EXPLAIN
SELECT count(*)
FROM t1
WHERE length(name) = 5

Query id: c5ce5b52-bc5b-4315-8546-48bb74eb510c

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                     │
│   Aggregating                                                                   │
│     Expression (Before GROUP BY)                                                │
│       Filter (WHERE)                                                            │
│         SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│           ReadFromStorage (MergeTree)                                           │
└─────────────────────────────────────────────────────────────────────────────────┘

6 rows in set. Elapsed: 0.004 sec.

localhost :) explain select count(*) from tt1 where length(name)=5;

EXPLAIN
SELECT count(*)
FROM tt1
WHERE length(name) = 5

Query id: 9d1c9b43-344e-4e8b-91a1-2af55e417090

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                     │
│   Aggregating                                                                   │
│     Expression (Before GROUP BY)                                                │
│       Filter (WHERE)                                                            │
│         SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│           ReadFromStorage (MergeTree)                                           │
└─────────────────────────────────────────────────────────────────────────────────┘

6 rows in set. Elapsed: 0.011 sec.