Home Not able to figure-out where I am making a mistake

Questions

Not able to figure-out where I am making a mistake

August 18, 2023

I have one data frame column dtc_mined which contains value like below seperated by |

P18A253|P18A0|P18A2|P18A043|P2B61

here it contains some values some of its length is 5 (P18A2) or some of its length is 7 (P18A043) now I want to check if the value of 5 length word is available in the value of 7 length words
then the 5 length word should be removed

Below is my expected output

P18A253|P18A043|P2B61

Below is my code which I have tried

import pandas as pd

# Sample DataFrame
data = {'dtc_mined': ['P18A253|P18A0|P18A2|P18A043|P2B61']}
df = pd.DataFrame(data)

# Split the values and create sets of 5 and 7 character words
df['split_values'] = df['dtc_mined'].str.split('|')
df['words_5'] = df['split_values'].apply(lambda lst: set(word for word in lst if len(word) == 5))
df['words_7'] = df['split_values'].apply(lambda lst: set(word for word in lst if len(word) == 7))

# Remove 5-character words that have a corresponding 7-character word
df['filtered_values'] = df.apply(lambda row: '|'.join(word for word in row['split_values'] if len(word) == 7 or word not in row['words_7']), axis=1)

# Drop intermediate columns and display the result
result = df.drop(['split_values', 'words_5', 'words_7'], axis=1)
print(result)

I also tried another approach

Remove 5-character words that have a corresponding 7-character word

def Check1(row):
    for word in row['words_5']:
        if word not in row['words_7']:
            row['words_7'].add(word)
    return row['words_7']

df['filtered_values'] = df.apply(Check1, axis=1)

>Solution :

This is probably most easily done by doing everything in a function and applying that to the values:

def filter_words(ll):
    words = ll.split('|')
    w7 = set(w for w in words if len(w) == 7)
    return '|'.join(w for w in words if w in w7 or not any(w2.startswith(w) for w2 in w7))

This function forms a set of the 7 letter words, then filters the word list based on whether the word is in that set, or none of the words in the set start with the same letters.

To use, just apply:

df['filtered_values'] = df['dtc_mined'].apply(filter_words)

Output (for your sample data):

                           dtc_mined        filtered_values
0  P18A253|P18A0|P18A2|P18A043|P2B61  P18A253|P18A043|P2B61

regex-lookarounds

byMR

Published August 18, 2023

Add a comment

ggplot2 grouped barplot with relative frequencies

byMR

August 18, 2023

Questions

Access Nested Object with Changing Wrapper

byMR

August 18, 2023

Questions

Bottom Overflowed By Pixels and does not working navigate function

byMR

August 18, 2023

Questions

For loop configuration in Locust/Python

byMR

August 18, 2023

Questions

Modifying Authors and Utilizing Git Rebase in Collaborative Workflows

byMR

August 18, 2023

Questions

Understand the behavior of `super.init()` with dynamic value in parent class

byMR

August 18, 2023

Not able to figure-out where I am making a mistake

MEDevel.com: Open-source for Healthcare and Education

Remove 5-character words that have a corresponding 7-character word

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

ggplot2 grouped barplot with relative frequencies

Access Nested Object with Changing Wrapper

Bottom Overflowed By Pixels and does not working navigate function

For loop configuration in Locust/Python

Modifying Authors and Utilizing Git Rebase in Collaborative Workflows

Understand the behavior of `super.init()` with dynamic value in parent class

Keep Up to Date with the Most Important News

Not able to figure-out where I am making a mistake

MEDevel.com: Open-source for Healthcare and Education

Remove 5-character words that have a corresponding 7-character word

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

ggplot2 grouped barplot with relative frequencies

Access Nested Object with Changing Wrapper

Bottom Overflowed By Pixels and does not working navigate function

For loop configuration in Locust/Python

Modifying Authors and Utilizing Git Rebase in Collaborative Workflows

Understand the behavior of `super.__init__()` with dynamic value in parent class

Discover more from Dev solutions

Understand the behavior of `super.init()` with dynamic value in parent class