Home how to get mentions in pytorch NER instead of toknes?

Questions

how to get mentions in pytorch NER instead of toknes?

February 3, 2022

I am using PyTorch and a pre-trained model.

Here is my code:

class NER(object):
    def __init__(self, model_name_or_path, tokenizer_name_or_path):
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path)
        self.model = AutoModelForTokenClassification.from_pretrained(
            model_name_or_path)
        self.nlp = pipeline("ner", model=self.model, tokenizer=self.tokenizer)

    def get_mention_entities(self, query):
        return self.nlp(query)

when I call get_mention_entities and print its output for "اینجا دانشگاه صنعتی امیرکبیر است."

it gives:

[{'entity': 'B-FAC', 'score': 0.9454591, 'index': 2, 'word': 'دانشگاه', 'start': 6, 'end': 13}, {'entity': 'I-FAC', 'score': 0.9713519, 'index': 3, 'word': 'صنعتی', 'start': 14, 'end': 19}, {'entity': 'I-FAC', 'score': 0.9860724, 'index': 4, 'word': 'امیرکبیر', 'start': 20, 'end': 28}]

As you can see, it can recognize the university name, but there are three tokens in the list.

Is there any standard way to combine these tokens based on the "entity" attribute?

desired output is something like:

[{'entity': 'FAC', 'word': 'دانشگاه صنعتی امیرکبیر', 'start': 6, 'end': 28}]

Finally, I can write a function to iterate, compare, and merge the tokens based on the "entity" attribute, but I want a standard way like an internal PyTorch function or something like this.

my question is similar to this question.

PS: "دانشگاه صنعتی امیرکبیر" is a university name.

>Solution :

Huggingface’s NER pipeline has an argument grouped_entities=True which will do exactly what you seek: group BI into unified entities.

Adding

self.nlp = pipeline("ner", model=self.model, tokenizer=self.tokenizer, grouped_entities=True)

should do the trick

mention

byMR

Published February 03, 2022

Add a comment

Swap variables that pointers point in

byMR

February 3, 2022

Questions

Set breaks between values in continuous axis of ggplot

byMR

February 3, 2022

Questions

read write property of type record

byMR

February 3, 2022

Questions

Tkinter Labels not displaying correctly

byMR

February 3, 2022

Questions

Trying to join these 3 tables but nothing shows up

byMR

February 3, 2022

Questions

Updating an input and triggering an action button inside an observer

byMR

February 3, 2022

how to get mentions in pytorch NER instead of toknes?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Swap variables that pointers point in

Set breaks between values in continuous axis of ggplot

read write property of type record

Tkinter Labels not displaying correctly

Trying to join these 3 tables but nothing shows up

Updating an input and triggering an action button inside an observer

Keep Up to Date with the Most Important News

how to get mentions in pytorch NER instead of toknes?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Swap variables that pointers point in

Set breaks between values in continuous axis of ggplot

read write property of type record

Tkinter Labels not displaying correctly

Trying to join these 3 tables but nothing shows up

Updating an input and triggering an action button inside an observer

Discover more from Dev solutions