I have two columns – one with sentences and the other with single words.
| Sentence | word |
|---|---|
| "Such a day! It’s a beautiful day out there" | "beautiful" |
| "Such a day! It’s a beautiful day out there" | "day" |
| "I am sad by the sad weather" | "weather" |
| "I am sad by the sad weather" | "sad" |
I want to count the frequency of the "word" column in the "sentence" column
and achieve this output:
| Sentence | word | n |
|---|---|---|
| "Such a day! It’s a beautiful day out there" | "beautiful" | 1 |
| "Such a day! It’s a beautiful day out there" | "day" | 2 |
| "I am sad by the sad weather" | "weather" | 1 |
| "I am sad by the sad weather" | "sad" | 2 |
I tried:
ok = []
for l in [x.split() for x in df['Sentence']]:
for y in df['word']:
ok.append(l.count(y))
However it does NOT stop running and takes A VERY long time, so is not feasible for my actual dataset as it has 50k rows.
Anyone can help to achieve this?
>Solution :
You can do it with zip
df['new'] = [x.count(y) for x, y in zip(df.Sentence,df.word)]
df
Out[419]:
Sentence word new
0 Such a day! It's a beautiful day out there beautiful 1
1 Such a day! It's a beautiful day out there day 2
2 I am sad by the sad weather weather 1
3 I am sad by the sad weather sad 2