I have a problem, I want to count the unique words from a dataframe, but unfortunately it only counts the first sentences.
text
0 hello is a unique sentences
1 hello this is a test
2 does this works
import pandas as pd
d = {
"text": ["hello is a unique sentences",
"hello this is a test",
"does this works"],
}
df = pd.DataFrame(data=d)
from collections import Counter
# Count unique words
def counter_word(text_col):
print(len(text_col.values))
count = Counter()
for i, text in enumerate(text_col.values):
print(i)
for word in text.split():
count[word] += 1
return count
counter = counter_word(df['text'])
len(counter)
>Solution :
I think simplier is join values by space, then split for words and count:
counter = Counter((' '.join(df['text'])).split())
print (counter)
Counter({'hello': 2, 'is': 2, 'a': 2, 'this': 2, 'unique': 1, 'sentences': 1, 'test': 1, 'does': 1, 'works': 1})