Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Failed lemmatization

I’m trying to lemmatize german texts which are in a dataframe.
I use german library to succesfully handle with specific grammatic structure: https://github.com/jfilter/german-preprocessing

My code:

from german import preprocess

df = pd.read_csv('Afd.csv', sep=',')

Lemma = open('MessageAFD_lemma.txt', 'w')
for i in df['message']:
    preprocess (i, remove_stop=True)
    Lemma.write(i)
Lemma.close()

The process of lemmatization goes successfully, there’s no any error in the terminal, but openning the file "MessageAFD_lemma.txt", I get this : (nothing was lemmatized)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The expected result is like:

Input:

preprocess(['Johpannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)

Output:
['johannes gut schüler', 'julia trinken tee']

What goes wrong?

>Solution :

The preprocess function returns a copy of the texts, instead of modifying the input. So you need to write the result of preprocess to the file, not the original i messages.

Furthermore, preprocess accepts a list of texts to process, so you must wrap your message in [message], and extract the single result from the returned list with result, = ...

from german import preprocess

df = pd.read_csv('Afd.csv', sep=',')

Lemma = open('MessageAFD_lemma.txt', 'w')
for message in df['message']:
    result, = preprocess([message], remove_stop=True)
    Lemma.write(result)
Lemma.close()

# Or, to process all messages in one go:
with open('MessageAFD_lemma.txt', 'w') as f:
    for result in preprocess(df['message'], remove_stop=True):
        f.write(result)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading