Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Spacy Regex "SyntaxError: invalid syntax"

Hi everyone I am executing this code in Spacy to match with Regex, but I get an error:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
doc1 = nlp("Hello hello hello, how are you?")
doc2 = nlp("Hello, how are you?")
doc3 = nlp("How are you?")
pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
matcher.add("greetings",  [pattern])
for mid, start, end in matcher(doc1):
print(start, end, doc1[start:end])

The error is

pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
                                                                                  ^
SyntaxError: invalid syntax

I am following a book called Mastering Spacy and I copy-pasted the code from the book, but I checked not to include any special characters.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Regards

>Solution :

A pattern added to the Matcher consists of a list of dictionaries.

(from docs). Your code, written more legibly:

pattern = [
    {
        "LOWER": {"IN": ["hello", "hi", "hallo"]},
        "OP": "*",
        {"IS_PUNCT": True}
    }
]

The first dictionary has three entries, but the third entry is malformed: each entry to a dictionary should consist of key: value, but you only have one item, which does not fit dictionary syntax.

Along those lines,

Each dictionary describes one token and its attributes.

Something that, lowercased, is in ["hello", "hi", "hallo"] cannot ever be punctuation. You seem to want to match something like "Hi Hi Hello!", two tokens with the first of them allowing for repetition; this would be matched by something like

pattern = [
    {
        "LOWER": {"IN": ["hello", "hi", "hallo"]},
        "OP": "*",
    },
    { "IS_PUNCT": True }
]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading