Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Manual select an ID to return all possible connections

The dataframe

   ColumnA                     TempB   value
0  2149712                    1431291    7.7
1  2149712                    1421222    6.3
4  2149712                    5212412    5.1
5  1421222                    1451826   15.1
6  1421222                    3124512   12.5
7  1421221                    2141241   11.2
8  4312125                    5212412    4.9

shows on ColumnA and TempB some identifiers (customers ID).
I need to create a step of code that can allow users to select manually an ID (it does not matter if from ColumnA or TempB) and return all the possible connections with other identifies, if any. The results should be sorted by value (descending).

What I have tried so far is shown below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

select_id = input('Enter IDs to look for (separated by space): ')
ids_list = select_id.split()

pattern = '|'.join(r"\b{}\b".format(x) for x in ids_list)
df = df[(df['ColumnA'].str.contains(pattern)) | (df['TempB'].str.contains(pattern))] 
df= df[["ColumnA", "TempB", "value"]].sort_values(by="value", ascending=False)

For instance: let’s say that I select ID= 1421222.
I would expect the following output:

ID               Link     value
1421222         1451826   15.1
1421222         3124512   12.5
1421222         2149712    6.3

or in case of multiple IDs (e.g. 1421222 and 4312125):

ID               Link     value
1421222         1451826   15.1
1421222         3124512   12.5
1421222         2149712    6.3
4312125         5212412    4.9

I think it is easier that what I think to generate the above outputs, but actually I am finding difficult to reproduce the logic of selection.

>Solution :

subset = df[df[['ColumnA', 'TempB']].isin([1421222, 4312125]).sum(axis=1).astype(bool)]

Output:

>>> subset
   ColumnA    TempB  value
1  2149712  1421222    6.3
5  1421222  1451826   15.1
6  1421222  3124512   12.5
8  4312125  5212412    4.9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading