Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python regex lookbehind to remove _sublabel1 in string like "__label__label1_sublabel1"

i have dataset that prepare for train in fasttext and i wanna remove sublabels from dataset
for example:

__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data.

Any help much appreciated
thanks

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

im tried this:

r'(?<=__label__[^_]+)\w+'

isnt working
exact code:

ptrn = r'(?<=__label__[^_]+)\w+'

re.sub(ptrn, '', test_String)

and this error was occured:
error:

error Traceback (most recent call
last)
c:\Users\THoseini\Desktop\projects\ensani_classification\tes4t.ipynb
Cell 3 in <cell line: 3>()
1 ptrn = r'(?<=label[^_]+)\w+’
—-> 3 re.sub(ptrn, ”, test_String)

File
c:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\re.py:209,
in sub(pattern, repl, string, count, flags)
202 def sub(pattern, repl, string, count=0, flags=0):
203 """Return the string obtained by replacing the leftmost
204 non-overlapping occurrences of the pattern in string by the
205 replacement repl. repl can be either a string or a callable;
206 if a string, backslash escapes in it are processed. If it is
207 a callable, it’s passed the Match object and must return
208 a replacement string to be used."""
–> 209 return _compile(pattern, flags).sub(repl, string, count)

File
c:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\re.py:303,
in _compile(pattern, flags)
301 if not sre_compile.isstring(pattern):
302 raise TypeError("first argument must be string or compiled pattern")
–> 303 p = sre_compile.compile(pattern, flags)
304 if not (flags & DEBUG):
305 if len(_cache) >= _MAXCACHE:
306 # Drop the oldest item

File
c:\Users\THoseini\AppData\Local\Programs\Python\Python310\lib\sre_compile.py:792,
in compile(p, flags)
–> 198 raise error("look-behind requires fixed-width pattern")
199 emit(lo) # look behind
200 _compile(code, av[1], flags)

error: look-behind requires fixed-width pattern

>Solution :

try this regex:

(__label__[^_\s]+)\w+

and a sample code in python:

import re
test_string = """__label__label1_sublabel1 __label__label2_sublabel1 __label__label3 __label__label1_sublabel4 sometext some sentce som data."""

ptrn = r'(__label__[^_\s]+)\w+'
re.sub(ptrn, r'\1', test_string) 

The re.sub() function stands for a substring and returns a string with replaced values.
[^character_group] means negation: Matches any single character that is not in character_group. and \w matches any word character. \s matches any white-space character.

and output are like expected:

__label__label1 __label__label2 __label__label __label__label1 sometext some sentce som data.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading