Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Performance of tuple-based columns

I’m building an event-based system where event properties (defined as columns) can be one of 3 data types (string, boolean, Float32). I’m trying to decide between creating a separate column for each type, or each column is a tuple.

For example…

one column per type:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

CREATE TABLE event
(
  event_name String,
  timestamp DateTime DEFAULT now(),
  
  -- Property with 3 types
  prop1_str String,
  prop1_num Float32,
  prop1_bool Boolean
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)

one column per property with 3-type tuple:

CREATE TABLE event
(
  event_name String,
  timestamp DateTime DEFAULT now(),
  
  -- Property with 3 types
  prop1 Tuple(
    str String, 
    num Float32, 
    bool Boolean
  )
)
ENGINE = MergeTree ORDER BY (event_name, timestamp)

The one column per property feels more organized, but my gut suspects that this might be less performant for querying.

>Solution :

Old versions (20…) read all tuple columns even if you query only one element of a Tuple. This issue is resolved.

There is no difference for modern versions of Clickhouse. The same storage and the same compute is in both cases.

3 columns

insert into event select '', now(), toString(number), number, number%2 from numbers(1e8);
0 rows in set. Elapsed: 9.311 sec. Processed 100.65 million rows, 805.21 MB (10.81 million rows/s., 86.48 MB/s.)

select count() from event where prop1_num > 42;
1 row in set. Elapsed: 0.116 sec. Processed 100.00 million rows, 400.00 MB (862.20 million rows/s., 3.45 GB/s.)

select count() from event where prop1_str like '%42%';
1 row in set. Elapsed: 1.135 sec. Processed 100.00 million rows, 1.69 GB (88.08 million rows/s., 1.49 GB/s.)

tuple

insert into event select '', now(), tuple(toString(number), number, number%2) from numbers(1e8);
0 rows in set. Elapsed: 8.807 sec. Processed 100.65 million rows, 805.21 MB (11.43 million rows/s., 91.43 MB/s.)

select count() from event where prop1.num > 42;
1 row in set. Elapsed: 0.117 sec. Processed 100.00 million rows, 400.00 MB (854.28 million rows/s., 3.42 GB/s.)

select count() from event where prop1.str like '%42%';
1 row in set. Elapsed: 1.129 sec. Processed 100.00 million rows, 1.69 GB (88.59 million rows/s., 1.50 GB/s.)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading