Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to find lines in pandas columns with close values?

I need to find ‘user_id’ of users standing closeby to each other. So we have data:

import pandas as pd

d = {'user_id': [11,24,101,214,302,335],
            'worker_latitude': [-34.6209, -2.7572, 55.6621, 
55.114462, 55.6622,-34.6209], 
            'worker_longitude': [-58.3742, 52.3879, 56.6621, 38.927156,
 56.6622, 39.018]}
df = pd.DataFrame(data=d)
df
   user_id  worker_latitude  worker_longitude
0       11       -34.620900        -58.374200
1       24        -2.757200         52.387900
2      101        55.662100         56.662100
3      214        55.114462         38.927156
4      302        55.662200         56.662200
5      335       -34.620900         39.018000

So, in this dataset it would be users with id ‘101’ and ‘302’.
But our dataset has millions of lines in it. Are there any built-in functions in pandas or python to solve the issue?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Assuming the workers need to share the same location to be considered standing closeby, a groupby by location can match workers efficiently:

from itertools import combinations

import pandas as pd

d = {'user_id': [11, 24, 101, 214, 302, 335],
     'worker_latitude': [-34.6209, -2.7572, 55.6621,
                         55.114462, 55.6621, -34.6209],
     'worker_longitude': [-58.3742, 52.3879, 56.6621, 38.927156,
                          56.6621, 39.018]}
df = pd.DataFrame(data=d)

matched_workers = df.groupby(['worker_latitude', 'worker_longitude']).apply(
    lambda rows: list(combinations(rows['user_id'], r=2)))
matched_workers = matched_workers.loc[matched_workers.apply(bool)]

Which outputs:

worker_latitude  worker_longitude
55.6621          56.6621             [(101, 302)]
dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading