Reading files in a directory and saving each dynamically

text_open = open("inputfiles/(22).txt", "r") text = text_open.read() doc = nlp(text) db.add(doc) db.to_disk("./outputfiles/22.spacy") I am trying to loop over each of the 500+ documents in the inputfiles folder and output them through db.to_disk. Instead of changing the hard coded numbers every-time, how would I dynamically rename each new output file to match the input file? If… Read More Reading files in a directory and saving each dynamically

March 19, 2023 MRLeave a comment

Python NLP processing if statement not in stop words list

I’m working with NLP spacy library and I created a function to return a list of token from a text. import spacy def preprocess_text_spacy(text): stop_words = ["a", "the", "is", "are"] nlp = spacy.load(‘en_core_web_sm’) tokens = set() doc = nlp(text) for word in doc: if word.is_currency: tokens.add(word.lower_) elif len(word.lower_) == 1: if word.is_digit and float(word.text) ==… Read More Python NLP processing if statement not in stop words list

March 9, 2023 MRLeave a comment

Python NLP processing if statement not in stop words list

March 9, 2023 MRLeave a comment

Spacy incorrectly identifying pronouns

When I try this code using Spacy, I get the desired result: import spacy nlp = spacy.load("en_core_web_sm") # example 1 test = "All my stuff is at to MyBOQ" doc = nlp(test) for word in doc: if word.pos_ == ‘PRON’: print(word.text) The output shows All and my. However, if I add a question mark: test… Read More Spacy incorrectly identifying pronouns

January 17, 2023 MRLeave a comment

spacy matcher pattern IN + REGEX Tag

My goal is to match with spacy the sentences that contain one of the following words: [‘studium’,’abschluss’,’ausbildung’] I can solve the problem with this line: pattern = [{"LOWER": {‘IN’:[‘studium’,’abschluss’, ‘ausbildung’]}}] My problem is that in German there is a vast use of composed words like Hochschulstudium, Masterstudium, Studiengang etc. How can use the regex inside… Read More spacy matcher pattern IN + REGEX Tag

November 9, 2022 MRLeave a comment

Extracting start and end indices of a token using spacy

I am looking at lots of sentences and looking to extract the start and end indices of a word in a given sentence. For example, the input is: This is a sentence written in english by a native English speaker. And What I want is the span of the word ‘English’ which in this case… Read More Extracting start and end indices of a token using spacy

May 10, 2022 MRLeave a comment

What does `i` in `token.i+1` mean when using a token returned by spacy's Language?

from spacy.language import Language @Language.component("CustomB") def set_custom_boundaries(doc): for token in doc[:-1]: if token.text == ‘;’: doc[token.i+1].is_sent_start = True return doc nlp.add_pipe("CustomB",before="parser") All I need to know is what does i+1 do in this code: doc[token.i+1] knowing that i is not defined in the function, neither as an index nor as a simple variable. >Solution :… Read More What does `i` in `token.i+1` mean when using a token returned by spacy's Language?

April 24, 2022 MRLeave a comment

Access n:th element of every list in list of lists

I have a problem in python and one step I need to do is to access the second element of every list in a list of lists. The list is: [(0, ‘Gallery’, ‘PROPN’, ‘nsubj’), (1, ‘unveils’, ‘VERB’, ‘root’), (2, ‘interactive’, ‘ADJ’, ‘amod’)] [(0, ‘A’, ‘DET’ ‘det’), (1 ‘Christmas’ , ‘PROPN’, ‘compound’), (2, ‘tree’ ,’NOUN’, ‘nsubjpass’)]… Read More Access n:th element of every list in list of lists

February 6, 2022 MRLeave a comment

spacy models in django

I have written the following code and placed it in the settings.py script of my Django backend. My App also has an Angular frontend. import spacy from spacy import displacy from spacy.lang.en import English SUPPORTED_LANGUAGES = [‘de’, ‘en’] LANGUAGE_MODELS = {} for language in SUPPORTED_LANGUAGES: try: LANGUAGE_MODELS[language] = spacy.load(language) except OSError: print(‘Warning: model {} not… Read More spacy models in django

January 25, 2022 MRLeave a comment

Dev solutions

Solutions for development problems

Tag: spacy

Reading files in a directory and saving each dynamically

Python NLP processing if statement not in stop words list

Python NLP processing if statement not in stop words list

Spacy incorrectly identifying pronouns

spacy matcher pattern IN + REGEX Tag

Extracting start and end indices of a token using spacy

What does `i` in `token.i+1` mean when using a token returned by spacy's Language?

Access n:th element of every list in list of lists

spacy models in django