Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter dataframe with condition where all substrings in a column start with particular value

I have below data

   system_id           software_versions
0       1001          ['VA100', 'VB101']
1       1001          ['VA100', 'VA101']
2       1001          ['VA100', 'VB102','VC101']

I need to return rows which has software_versions beginning with VA only.

In this example, only second row matches this criteria, as it contains versions VA100 and VA101.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But other rows have some other version which is not starting with VA

Expected output

   system_id   software_versions
1       1001  ['VA100', 'VA101'] 

I began with this code. I’m not able to proceed further as how I can achieve this.
import pandas as pd

df = pd.DataFrame([[1001,"['VA100', 'VB101']"], [1001,"['VA100', 'VA101']"], [1001,"['VA100', 'VB102','VC101']"]],columns=['system_id', 'software_versions'])

Note: Actual data will come from a csv file, it has more than 1000 entries. I have provided the exact sample data here.

>Solution :

Idea is use generator for test if all values in list starting by VA and filter in boolean indexing:

import ast

df['software_versions'] = df['software_versions'].apply(ast.literal_eval)

df = df[df['software_versions'].map(lambda x: all(y.startswith('VA') for y in x))]
print (df)
   system_id software_versions
1       1001    [VA100, VA101]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading