Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Stemming and lemming words

I have a text document i need to use stemming and Lemmatization on. I have already cleaned the data and tokenised it as well as removing stop words

what i need to do is take the list as an input and return a dict and the dict should have the keys ‘original stem and lemmma. and the values being the nth word transformed in that way

  snowball stemmer is defined as Stemmer()
  and WordNetLemmatizer is defined as lemmatizer()

heres the code ive written but it does give our an error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def find_roots(token_list, n):
n = 2
original = tokens
stem = [ele for sub in original for idx, ele in 
enumerate(sub.split()) if idx == (n - 1)]
stem = stemmer(stem)
lemma = [ele for sub in original for idx, ele in 
enumerate(sub.split()) if idx == (n - 1)]
lemma = lemmatizer()
return 

Any help would be appreciated

>Solution :

I really don’t understand what you are trying to do in the list comprehensions, so I’ll just write how I would do it:

from nltk import WordNetLemmatizer, SnowballStemmer

lemmatizer = WordNetLemmatizer()
stemmer = SnowballStemmer("english")


def find_roots(token_list, n):
    token = token_list[n]
    stem = stemmer.stem(token)
    lemma = lemmatizer.lemmatize(token)
    return {"original": token, "stem": stem, "lemma": lemma}


roots_dict = find_roots(["said", "talked", "walked"], n=2)
print(roots_dict)
> {'original': 'walked', 'stem': 'walk', 'lemma': 'walked'}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading