Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Building dictionary of unique IDs for pairs of matching strings

I have a dataframe like this

#Test dataframe
import pandas as pd
import numpy as np

#Build df

titles = {'Title': ['title1', 'cat', 'dog']}
references = {'References': [['donkey','chicken'],['title1','dog'],['bird','snake']]}

df = pd.DataFrame({'Title': ['title1', 'cat', 'dog'], 'References': [['donkey','chicken'],['title1','dog'],['bird','snake']]})
#Insert IDs for UNIQUE titles
title_ids = {'IDs':list(np.arange(0,len(df)) + 1)}

df['IDs'] = list(np.arange(0,len(df)) + 1)
df = df[['Title','IDs','References']]

enter image description here

and I want to generate IDs for the references column that looks like the data frame below. If there is a matching between the strings, assign the same ID as in the IDs column and if not, assign a new unique ID.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

My first attempt is using the function

#Matching function
def string_match(string1,string2):
    if string1 == string2:
        a = 1
    else:
        a = 0

    return a

and to loop over each string/title combination but this gets tricky with multiple for loops and if statements. Is there a better way I can do this that is more pythonic?

>Solution :

# Explode to one reference per row
references = df["References"].explode()

# Combine existing titles with new title from References
titles = pd.concat([df["Title"], references]).unique()

# Assign each title an index number
mappings = {t: i + 1 for i, t in enumerate(titles)}

# Map the reference to the index number and convert to list
df["RefIDs"] = references.map(mappings).groupby(level=0).apply(list)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading