Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I get the most frequent words in a column of text based on the value of another column?

I have a dataset of tweets and the year they were posted. I want to get a count of the most frequently occurring words each year. My dataset looks like this:

year     tweet
2015     my car is blue
2015     mom is making dinner
2016     my hair is red
2016     i love my mom

I only know how to get the most frequently occurring words for the entire dataset:

pd.Series(' '.join(df['tweets']).split()).value_counts()

Which would give me this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

my      3
is      3
mom     2
car     1
blue    1
making  1
dinner  1
hair    1
red     1
i       1
love    1

So how would I get something like this?

2015

is      2
my      1
car     1
blue    1
mom     1
making  1
dinner  1

2016

my      2
hair    1
is      1
red     1
i       1
love    1
mom     1

>Solution :

I’d do something like this:

counts = df.set_index('year')['tweet'].str.split().explode().groupby(level=0).apply(pd.value_counts)

Output:

>>> counts
year        
2015  is        2
      my        1
      car       1
      blue      1
      mom       1
      making    1
      dinner    1
2016  my        2
      hair      1
      is        1
      red       1
      i         1
      love      1
      mom       1
Name: tweet, dtype: int6

To get the top, say, 5 items per year:

df.set_index('year')['tweet'].str.split().explode().groupby(level=0).apply(lambda x: x.value_counts().head(5))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading