Count unique words with collections and dataframe

June 20, 2022

I have a problem, I want to count the unique words from a dataframe, but unfortunately it only counts the first sentences.

                          text
0  hello is a unique sentences
1         hello this is a test
2              does this works

import pandas as pd
d = {
    "text": ["hello is a unique sentences",
             "hello this is a test", 
             "does this works"],
}
df = pd.DataFrame(data=d)


from collections import Counter

# Count unique words
def counter_word(text_col):
    print(len(text_col.values))
    count = Counter()
    for i, text in enumerate(text_col.values):
        print(i)
        for word in text.split():
            count[word] += 1
        return count

counter = counter_word(df['text'])
len(counter)

>Solution :

I think simplier is join values by space, then split for words and count:

counter = Counter((' '.join(df['text'])).split())

print (counter)
Counter({'hello': 2, 'is': 2, 'a': 2, 'this': 2, 'unique': 1, 'sentences': 1, 'test': 1, 'does': 1, 'works': 1})