Home In sql, what is the most efficient way to find nearest pairs of points for two tables

Questions

In sql, what is the most efficient way to find nearest pairs of points for two tables

July 29, 2022

let’s say i have tableA.points which has a bunch of geographic points.

and i have tableB.points which has a bunch of geographic points, as well.

i want to compare every row in tableA.points to every row in tableB.points, and list the closest (shortest distance) point for each row in tableA.points.

so, actually two points:

the only way i can think of is a cartesian product (cross join) or a aggregate correlated subquery.
the issue with the correlated subquery is that i can’t attach tableB.Identifer to the outer query.

for example:

SELECT
  tableA.*
  ,(SELECT
      CONCAT(tableB.IDENTIFIER,':',MIN(ST_DISTANCE(tableA.points, tableB.points))
    FROM tableB) as closest_point
FROM tableA

The big problem here is that i have over a million rows for each of these tables…

Any advice?

edit: i could also use zip codes to maybe trim down each comparison in the subquery, for example WHERE tableA.zip = tableB.zip but still not sure how to get that identifier…

>Solution :

A simple way to find the nearest point is to aggregate sorting by distance, and take the first element, something like:

SELECT 
  table_a.idenfitier,
  ARRAY_AGG(table_b.identifier 
            ORDER BY ST_Distance(table_a.geog, table_b.geog) 
            LIMIT 1)
      [ORDINAL(1)] 
FROM ...
GROUP BY table_a.idenfitier

But as you correctly mentioned, cross product is very slow, so if you can – restrict the join to a limited radius using ST_DWithin.
Couple blog posts describe the ideas how to do this in BigQuery in details

https://mentin.medium.com/nearest-neighbor-in-bigquery-gis-7d50ebd5d63

https://mentin.medium.com/nearest-neighbor-using-bq-scripting-373241f5b2f5