I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'X': ['Ciao, I would like to count the number of occurrences in this text considering negations that can change the meaning of the sentence',
"Hello, not number of negations, in this case we need to take care of the negation.",
"Hello world, don't number is another case in which where we need to consider negations."]})
I would like to count how many times a string appears in those senteces. So I simply do:
d = pd.DataFrame(['need'], columns = ['D'])
df['X'].str.count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))
0 1
1 2
2 2
Name: X, dtype: int64
However, in the application I am doing, I need to loop over each element of df which means:
res=[]
for i in range(len(df)):
f = df['X'][i].count('|'.join(d.append({'D': 'number'}, ignore_index = True).D))
res.append(f)
[0,0,0]
I get two different results. The first one is obviously correct.
How can I fix it?
Thanks!
>Solution :
Use iterrows:
import re
words = ['need', 'number']
res = {}
for idx, row in df.iterrows():
count = len(re.findall('|'.join(words), row['X']))
res[idx] = count
df['count'] = pd.Series(res)
Output:
>>> df
X count
0 Ciao, I would like to count the number of occu... 1
1 Hello, not number of negations, in this case w... 2
2 Hello world, don't number is another case in w... 2