PostgresSQL: Getting number of correspondences between two arrays

August 29, 2022

I want to compare and get number of corresposndeces between two arrays in PostgresSQL.

For example:
given the table

id	array
1	a, b, d
2	a, e
3	a, b, d, e
4	b, c, d

I would like to obtain a comparison table:

id_1	id_2	number_of_matches
1	2	1
1	3	3
1	4	2
2	3	2
2	4	0
3	4	2

There is no repeated values in each array, and the order of items inside arrays does not matter.

I found here about the overlap operator (&&) but it only returns a boolean if there is or not any correspondence between two arrays, but not the number of correspondences.

CREATE TABLE data (id integer, val text[]);
INSERT INTO data VALUES (1, '{a,b,d}');
INSERT INTO data VALUES (2, '{a,e}');
INSERT INTO data VALUES (3, '{a,b,d,e}');
INSERT INTO data VALUES (4, '{b,c,d}');
SELECT a.id id_1, b.id id_2, a.val && b.val match 
FROM data a, data b
WHERE a.id < b.id;

returns

id_1	id_2	match
1	2	t
1	3	t
1	4	t
2	3	t
2	4	f
3	4	t

>Solution :

demo:db<>fiddle

WITH normalized AS (                      -- 2
    SELECT
        id,
        unnest(val) as array_element      -- 1
    FROM data
)
SELECT
    n1.id,
    n2.id,
    COUNT(*)
FROM normalized n1
JOIN normalized n2                        -- 3
    ON (n1.array_element = n2.array_element AND n1.id <> n2.id)
GROUP BY n1.id, n2.id                     -- 4
ORDER BY n1.id, n2.id

Normalize your data (note: You should think about normalizing your data directly in the table; means: no arrays) by creating one row per array element
Use a CTE to "store" your normalized result
Self-join your normalized result (but not the same ids) at the array_element
GROUP and COUNT