Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In sql, what is the most efficient way to find nearest pairs of points for two tables

let’s say i have tableA.points which has a bunch of geographic points.

and i have tableB.points which has a bunch of geographic points, as well.

i want to compare every row in tableA.points to every row in tableB.points, and list the closest (shortest distance) point for each row in tableA.points.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

so, actually two points:

  1. the only way i can think of is a cartesian product (cross join) or a aggregate correlated subquery.
  2. the issue with the correlated subquery is that i can’t attach tableB.Identifer to the outer query.

for example:

SELECT
  tableA.*
  ,(SELECT
      CONCAT(tableB.IDENTIFIER,':',MIN(ST_DISTANCE(tableA.points, tableB.points))
    FROM tableB) as closest_point
FROM tableA

The big problem here is that i have over a million rows for each of these tables…

Any advice?

edit: i could also use zip codes to maybe trim down each comparison in the subquery, for example WHERE tableA.zip = tableB.zip but still not sure how to get that identifier…

>Solution :

A simple way to find the nearest point is to aggregate sorting by distance, and take the first element, something like:

SELECT 
  table_a.idenfitier,
  ARRAY_AGG(table_b.identifier 
            ORDER BY ST_Distance(table_a.geog, table_b.geog) 
            LIMIT 1)
      [ORDINAL(1)] 
FROM ...
GROUP BY table_a.idenfitier

But as you correctly mentioned, cross product is very slow, so if you can – restrict the join to a limited radius using ST_DWithin.
Couple blog posts describe the ideas how to do this in BigQuery in details

https://mentin.medium.com/nearest-neighbor-in-bigquery-gis-7d50ebd5d63

https://mentin.medium.com/nearest-neighbor-using-bq-scripting-373241f5b2f5

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading