Reading files in a directory and saving each dynamically

text_open = open("inputfiles/(22).txt", "r") text = text_open.read() doc = nlp(text) db.add(doc) db.to_disk("./outputfiles/22.spacy") I am trying to loop over each of the 500+ documents in the inputfiles folder and output them through db.to_disk. Instead of changing the hard coded numbers every-time, how would I dynamically rename each new output file to match the input file? If… Read More Reading files in a directory and saving each dynamically

Python NLP processing if statement not in stop words list

I’m working with NLP spacy library and I created a function to return a list of token from a text. import spacy def preprocess_text_spacy(text): stop_words = ["a", "the", "is", "are"] nlp = spacy.load(‘en_core_web_sm’) tokens = set() doc = nlp(text) for word in doc: if word.is_currency: tokens.add(word.lower_) elif len(word.lower_) == 1: if word.is_digit and float(word.text) ==… Read More Python NLP processing if statement not in stop words list

Python NLP processing if statement not in stop words list

I’m working with NLP spacy library and I created a function to return a list of token from a text. import spacy def preprocess_text_spacy(text): stop_words = ["a", "the", "is", "are"] nlp = spacy.load(‘en_core_web_sm’) tokens = set() doc = nlp(text) for word in doc: if word.is_currency: tokens.add(word.lower_) elif len(word.lower_) == 1: if word.is_digit and float(word.text) ==… Read More Python NLP processing if statement not in stop words list

spacy matcher pattern IN + REGEX Tag

My goal is to match with spacy the sentences that contain one of the following words: [‘studium’,’abschluss’,’ausbildung’] I can solve the problem with this line: pattern = [{"LOWER": {‘IN’:[‘studium’,’abschluss’, ‘ausbildung’]}}] My problem is that in German there is a vast use of composed words like Hochschulstudium, Masterstudium, Studiengang etc. How can use the regex inside… Read More spacy matcher pattern IN + REGEX Tag

What does `i` in `token.i+1` mean when using a token returned by spacy's Language?

from spacy.language import Language @Language.component("CustomB") def set_custom_boundaries(doc): for token in doc[:-1]: if token.text == ‘;’: doc[token.i+1].is_sent_start = True return doc nlp.add_pipe("CustomB",before="parser") All I need to know is what does i+1 do in this code: doc[token.i+1] knowing that i is not defined in the function, neither as an index nor as a simple variable. >Solution :… Read More What does `i` in `token.i+1` mean when using a token returned by spacy's Language?

Access n:th element of every list in list of lists

I have a problem in python and one step I need to do is to access the second element of every list in a list of lists. The list is: [(0, ‘Gallery’, ‘PROPN’, ‘nsubj’), (1, ‘unveils’, ‘VERB’, ‘root’), (2, ‘interactive’, ‘ADJ’, ‘amod’)] [(0, ‘A’, ‘DET’ ‘det’), (1 ‘Christmas’ , ‘PROPN’, ‘compound’), (2, ‘tree’ ,’NOUN’, ‘nsubjpass’)]… Read More Access n:th element of every list in list of lists

spacy models in django

I have written the following code and placed it in the settings.py script of my Django backend. My App also has an Angular frontend. import spacy from spacy import displacy from spacy.lang.en import English SUPPORTED_LANGUAGES = [‘de’, ‘en’] LANGUAGE_MODELS = {} for language in SUPPORTED_LANGUAGES: try: LANGUAGE_MODELS[language] = spacy.load(language) except OSError: print(‘Warning: model {} not… Read More spacy models in django