Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

PostgresSQL: Getting number of correspondences between two arrays

I want to compare and get number of corresposndeces between two arrays in PostgresSQL.

For example:
given the table

id array
1 a, b, d
2 a, e
3 a, b, d, e
4 b, c, d

I would like to obtain a comparison table:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id_1 id_2 number_of_matches
1 2 1
1 3 3
1 4 2
2 3 2
2 4 0
3 4 2

There is no repeated values in each array, and the order of items inside arrays does not matter.

I found here about the overlap operator (&&) but it only returns a boolean if there is or not any correspondence between two arrays, but not the number of correspondences.

CREATE TABLE data (id integer, val text[]);
INSERT INTO data VALUES (1, '{a,b,d}');
INSERT INTO data VALUES (2, '{a,e}');
INSERT INTO data VALUES (3, '{a,b,d,e}');
INSERT INTO data VALUES (4, '{b,c,d}');
SELECT a.id id_1, b.id id_2, a.val && b.val match 
FROM data a, data b
WHERE a.id < b.id;

returns

id_1 id_2 match
1 2 t
1 3 t
1 4 t
2 3 t
2 4 f
3 4 t

>Solution :

demo:db<>fiddle

WITH normalized AS (                      -- 2
    SELECT
        id,
        unnest(val) as array_element      -- 1
    FROM data
)
SELECT
    n1.id,
    n2.id,
    COUNT(*)
FROM normalized n1
JOIN normalized n2                        -- 3
    ON (n1.array_element = n2.array_element AND n1.id <> n2.id)
GROUP BY n1.id, n2.id                     -- 4
ORDER BY n1.id, n2.id
  1. Normalize your data (note: You should think about normalizing your data directly in the table; means: no arrays) by creating one row per array element
  2. Use a CTE to "store" your normalized result
  3. Self-join your normalized result (but not the same ids) at the array_element
  4. GROUP and COUNT
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading