Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding value in a column and delete the value if not found in pandas

Comparing the columns df1_user and df2_user. If multiple values of df2_user is found in df1_user then merge those values in a new column with a delimiter.
I have a dataframe (df1) look like this:

df1:

df1_ID  df1_Tag df1_Info    df1_user
   1    Test1   Pass        100,200,300
   2    Test2   Pass        400,500,600,700
   3    Test3   Fail        800,900,1000,1100,1200
   4    Test4   Pass        1300
   5    Test5   Pass        1400,1500

df2:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2_user
100
300
500
600
700
1100
1200
1300
1400
1600

The result dataframe should look like this:

df3:

df1_ID  df1_Tag df1_Info    df3_user
1        Test1  Pass         100,300
2        Test2  Pass         500,600,700
3        Test3  Fail         1100,1200
4        Test4  Pass         1300
5        Test5  Pass         1400

The code looks like this:

for name in df2['df2_user'].to_list():
    df1.loc[ df1['df1_user'].str.contains(name), 'df3_user' ] = name

I am having issues merging the multiple values with a delimiter.

>Solution :

Use custom lambda function for test if match splitted values in set from column df2['df2_user']:

s = set(df2['df2_user'].astype(str))

f = lambda x: (','.join(y for y in x.split(',') if y in s))
df1['df3_user'] = df1['df1_user'].apply(f)
print (df1)
   df1_ID df1_Tag df1_Info                df1_user     df3_user
0       1   Test1     Pass             100,200,300      100,300
1       2   Test2     Pass         400,500,600,700  500,600,700
2       3   Test3     Fail  800,900,1000,1100,1200    1100,1200
3       4   Test4     Pass                    1300         1300
4       5   Test5     Pass               1400,1500         1400

Or use DataFrame.explode with filter rows by Series.isin, last join back by GroupBy.agg with join:

df1['df3_user'] = (df1.assign(df1_user = df1['df1_user'].str.split(','))
                      .explode('df1_user')
                      .loc[lambda x: x['df1_user'].isin(df2['df2_user'].astype(str))]
                      .groupby(level=0)['df1_user']
                      .agg(','.join))
print (df1)
   df1_ID df1_Tag df1_Info                df1_user     df3_user
0       1   Test1     Pass             100,200,300      100,300
1       2   Test2     Pass         400,500,600,700  500,600,700
2       3   Test3     Fail  800,900,1000,1100,1200    1100,1200
3       4   Test4     Pass                    1300         1300
4       5   Test5     Pass               1400,1500         1400
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading