I have a dataframe and a list as follows:
df = pd.DataFrame({'data1':['the weather is nice today','This is interesting','the weather is good'],
'data2':['It is raining','The plant is greenery','the weather is sunnyday']})
my_list = ['sunny','green']
I would like to replace the last words in the data2 column with the words my_list, if the last words start with the words in the list. so, this is what I did,
for k in ke:
for val in df2.data2:
if val.split()[-1].startswith(k):
print(val.replace(val.split()[-1], k))
but when i print it out, the order is affected by the order in the list, and I do not know how to assign them back to the same column.my desired output is,
data1 data2
0 the weather is nice today It is raining
1 This is interesting The plant is green
2 the weather is good the weather is sunny
>Solution :
One possible way is to build a regex that matches any last word starting by one of the words of your list. This is more efficient than looping over all the words of your list etc.
pat = re.compile(f"\\b({'|'.join(my_list)})\\S+$")
dfnew = df.assign(data2=df['data2'].str.replace(pat, r'\1', regex=True))
>>> dfnew
data1 data2
0 the weather is nice today It is raining
1 This is interesting The plant is green
2 the weather is good the weather is sunny