I’m building an event-based system where event properties (defined as columns) can be one of 3 data types (string, boolean, Float32). I’m trying to decide between creating a separate column for each type, or each column is a tuple.
For example…
one column per type:
CREATE TABLE event
(
event_name String,
timestamp DateTime DEFAULT now(),
-- Property with 3 types
prop1_str String,
prop1_num Float32,
prop1_bool Boolean
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)
one column per property with 3-type tuple:
CREATE TABLE event
(
event_name String,
timestamp DateTime DEFAULT now(),
-- Property with 3 types
prop1 Tuple(
str String,
num Float32,
bool Boolean
)
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)
The one column per property feels more organized, but my gut suspects that this might be less performant for querying.
>Solution :
Old versions (20…) read all tuple columns even if you query only one element of a Tuple. This issue is resolved.
There is no difference for modern versions of Clickhouse. The same storage and the same compute is in both cases.
3 columns
insert into event select '', now(), toString(number), number, number%2 from numbers(1e8);
0 rows in set. Elapsed: 9.311 sec. Processed 100.65 million rows, 805.21 MB (10.81 million rows/s., 86.48 MB/s.)
select count() from event where prop1_num > 42;
1 row in set. Elapsed: 0.116 sec. Processed 100.00 million rows, 400.00 MB (862.20 million rows/s., 3.45 GB/s.)
select count() from event where prop1_str like '%42%';
1 row in set. Elapsed: 1.135 sec. Processed 100.00 million rows, 1.69 GB (88.08 million rows/s., 1.49 GB/s.)
tuple
insert into event select '', now(), tuple(toString(number), number, number%2) from numbers(1e8);
0 rows in set. Elapsed: 8.807 sec. Processed 100.65 million rows, 805.21 MB (11.43 million rows/s., 91.43 MB/s.)
select count() from event where prop1.num > 42;
1 row in set. Elapsed: 0.117 sec. Processed 100.00 million rows, 400.00 MB (854.28 million rows/s., 3.42 GB/s.)
select count() from event where prop1.str like '%42%';
1 row in set. Elapsed: 1.129 sec. Processed 100.00 million rows, 1.69 GB (88.59 million rows/s., 1.50 GB/s.)