Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

filtered_df and str in Python

I am new to Python. I’m trying to filter Dataset. The filter seems to work well or I think it does:)

valid_Cas = ["yut", "thj", "bnm","vfd"]
filtered_df = df[df['Cas ID'].str[-3:].isin(valid_Cas)]

but when a filter more than three letters, it does not work,like:

valid_Cas = ["yut", "thj", "bnm","vfd","cdret"]
filtered_df = df[df['Cas ID'].str[-3:].isin(valid_Cas)]

what does it mean: str[-3:] ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

how can I filter more than 3 letters?

does the code filter "bnm5623" and "5623bnm" or does it leave it?

thank you,

>Solution :

what does it mean: str[-3:] ?

str[-3:0] is a slicing operation which means "take the last 3 characters of the string". E.g. With a given string like "abcde", "abcde"[-3:] would result in "cde". df['Cas ID'].str[-3:] performs this slicing operation on each element of the column in the dataframe.

how can I filter more than 3 letters?

To filter more than 3 characters just adjust the slicing operation to the desired length of string you are looking for. E.g. if you want to filter by strings that end with 'cdret' you would use str[-5:] because 'cdret' has a length of 5.

does the code filter "bnm5623" and "5623bnm" or does it leave it?does the code filter "bnm5623" and "5623bnm" or does it leave it?

The code df['Cas ID'].str[-3:].isin(valid_Cas) only checks the last three characters of each entry in the ‘Cas ID’ column against your valid_Cas list. So it would recognize 'bnm5623' as valid if '562' is in your list, but it wouldn’t recognize '5623bnm' as valid because it’s looking at the last three characters, which would be 'bnm'.

To filter more than 3 letters adjust the slicing operator to the longest string in your list. Here is how you would implement this:

valid_Cas = ["yut", "thj", "bnm", "vfd", "cdret"]
max_length = max(len(s) for s in valid_Cas)  # Find the length of the longest string in valid_Cas

# Filter based on the last characters of each string in 'Cas ID', using `max_length`
filtered_df = df[df['Cas ID'].str[-max_length:].isin(valid_Cas)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading