Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Not able to figure-out where I am making a mistake

I have one data frame column dtc_mined which contains value like below seperated by |

P18A253|P18A0|P18A2|P18A043|P2B61

here it contains some values some of its length is 5 (P18A2) or some of its length is 7 (P18A043) now I want to check if the value of 5 length word is available in the value of 7 length words
then the 5 length word should be removed

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Below is my expected output

P18A253|P18A043|P2B61

Below is my code which I have tried

import pandas as pd

# Sample DataFrame
data = {'dtc_mined': ['P18A253|P18A0|P18A2|P18A043|P2B61']}
df = pd.DataFrame(data)

# Split the values and create sets of 5 and 7 character words
df['split_values'] = df['dtc_mined'].str.split('|')
df['words_5'] = df['split_values'].apply(lambda lst: set(word for word in lst if len(word) == 5))
df['words_7'] = df['split_values'].apply(lambda lst: set(word for word in lst if len(word) == 7))

# Remove 5-character words that have a corresponding 7-character word
df['filtered_values'] = df.apply(lambda row: '|'.join(word for word in row['split_values'] if len(word) == 7 or word not in row['words_7']), axis=1)

# Drop intermediate columns and display the result
result = df.drop(['split_values', 'words_5', 'words_7'], axis=1)
print(result)

I also tried another approach

Remove 5-character words that have a corresponding 7-character word

def Check1(row):
    for word in row['words_5']:
        if word not in row['words_7']:
            row['words_7'].add(word)
    return row['words_7']

df['filtered_values'] = df.apply(Check1, axis=1)

>Solution :

This is probably most easily done by doing everything in a function and applying that to the values:

def filter_words(ll):
    words = ll.split('|')
    w7 = set(w for w in words if len(w) == 7)
    return '|'.join(w for w in words if w in w7 or not any(w2.startswith(w) for w2 in w7))

This function forms a set of the 7 letter words, then filters the word list based on whether the word is in that set, or none of the words in the set start with the same letters.

To use, just apply:

df['filtered_values'] = df['dtc_mined'].apply(filter_words)

Output (for your sample data):

                           dtc_mined        filtered_values
0  P18A253|P18A0|P18A2|P18A043|P2B61  P18A253|P18A043|P2B61
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading