I have a dataset. In the column ‘Tags’ I want to extract from each row all the content that has the word player. I could repeat or be alone in the same cell. Something like this:
‘view_snapshot_hi:hab,like_hi:hab,view_snapshot_foinbra,completed_profile,view_page_investors_landing,view_foinbra_inv_step1,view_foinbra_inv_step2,view_foinbra_inv_step3,view_snapshot_acium,player,view_acium_inv_step1,view_acium_inv_step2,view_acium_inv_step3,player_acium-ronda-2_r1,view_foinbra_rinv_step1,view_page_makers_landing’
expected output:
‘player,player_acium-ronda-2_r1’
And I need both.
df["Tags"] = df["Tags"].str.ectract(r'*player'*,?\s*')
I tried this but it´s not working.
>Solution :
You need to use Series.str.extract keeping in mind that the pattern should contain a capturing group embracing the part you need to extract.
The pattern you need is player[^,]*:
df["Tags"] = df["Tags"].str.extract(r'(player[^,]*)', expand=False)
The expand=False returns a Series/Index rather than a dataframe.