Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Frequency Distribution of Bigrams

I have done the following

import nltk


words = nltk.corpus.brown.words()
freq = nltk.FreqDist(words)

And am able to find the frequency of certain words in the brown corpus, like

freq["the"]
62713

But now I want to be able to find the Frequency Distribution of specific bigrams. So then I tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

bigrams = nltk.bigrams(words)
freqbig = nltk.FreqDist(bigrams)

But every bigram that I enter, I always get 0. Like,

freqbig["the man"]
0

What I am doing wrong?

>Solution :

It accepts a tuple as key, not a str:

freqbig[("the", "man")]

OUTPUT

128

You could create an auxiliary function which takes care of it if you want to pass strings:

def get_frequency(my_string):
    return freqbig[tuple(my_string.split(" "))]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading