I have the following list and a DataFrame:
the_list = ["one", "et", "allu", "Metall", "54ro", 'al89']
df = pd.DataFrame({ 'ID':[100, 200, 300, 400],
'String':['Jonel-al89 (et)', 'Stel-00(et) al89 x 57-mm', 'Metall, 54ro', "allu, Metall9(lop)"]
})
What I need is to make a new column where I would get all the elements from the list that are present in each string in the "String" column.
So the output should be looking like that:
ID | String | Desired_Column |
---|---|---|
100 | Jonel-al89 (et) | one, al89, et |
200 | Stel-00(et) al89 x 57-mm | et, al89 |
300 | Metall, 54ro | et, Metall, 54ro |
400 | allu, Metall9(lop) | allu, et, Metall |
What would be the way to achieve it?
Any help would be much appreciated!
>Solution :
You don’t even need regex if you use a list comprehension which checks for the presence of the elements from your list in the String column.
I’m not sure you want the elements as a list or as string, if you want a string put a str.join
around the comprehension.
import pandas as pd
the_list = ["one", "et", "allu", "Metall", "54ro", 'al89']
df = pd.DataFrame({ 'ID':[100, 200, 300, 400],
'String':['Jonel-al89 (et)', 'Stel-00(et) al89 x 57-mm', 'Metall, 54ro', "allu, Metall9(lop)"]
})
df["Desired_Column"] = df["String"].apply(lambda string: [el for el in the_list if el in string])
df
# gives
# ID String Desired_Column
# 0 100 Jonel-al89 (et) [one, et, al89]
# 1 200 Stel-00(et) al89 x 57-mm [et, al89]
# 2 300 Metall, 54ro [et, Metall, 54ro]
# 3 400 allu, Metall9(lop) [et, allu, Metall]