Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find words in an array in a specific order in Dataframe Pandas

I have dataframe:

import pandas as pd
data = {'token_1': [['cat', 'run','today'],['dog', 'eat', 'meat']],
        'token_2': [['cat', 'in', 'the' , 'morning','cat', 'run', 'today',
                      'very', 'quick', 'cat','today', 'jump', 'and', 'run', 'run', 'cat', 'today'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl','dog','see','meat','eat']]}


df = pd.DataFrame(data)

To find words from token_1 column in token_2 column array I use this:

lst_index = [[i for i, x in enumerate(b) if x in a] for a, b in zip(df['token_1'], df['token_2'])]
print(lst_index)

This gives me several indexes where the words enter:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[[0, 4, 5, 6, 9, 10, 13, 14, 15, 16], [0, 1, 2, 6, 8, 9]]

But I need to find the indixes for which the words are preferably in the same order as I have in the token_1 array, so that will be only:

[[4,5,6], [0,1,2]]

>Solution :

You can use a custom function to find the position of the first matching sublist (if any) in the other list:

def sublist(l, l_ref):
    # for each word in the list
    for pos, word in enumerate(l):
        # if we have enough words left to compare
        # and if it matches the first word of the reference
        if pos <= len(l)-len(l_ref) and word == l_ref[0]:
            # if all the next N words match (N being the length of the ref)
            if all(a==b for a,b in zip(l[pos:pos+len(l_ref)], l_ref)):
                return list(range(pos, pos+len(l_ref)))

[sublist(l2, l1) for l1, l2 in zip(df['token_1'], df['token_2'])]

output:

[[4, 5, 6], [0, 1, 2]]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading