Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Regex: How to remove brackets and \n

I am trying to remove the [] and \n when I use a regex expression to grab the data to get a score. Here is an example of what a description would look like

On Liars’ fifth album, the band throws their music into the deepest, darkest, most serpent-filled hole they could possibly find. They create an alternate dimension where eerie sounds and dissonant intervals reign supreme. There are some lush strings, a spot of horns; but Sisterworld revolves mostly around the simplicity of normal rock instrumentation. Yet, it sounds so otherworldly.
Overall, it lays on the tension a little too much, kind of making it difficult to enjoy. But it would be stupid to assume that wasn’t the underlying intent. Though I wasn’t able to enjoy that anxiety as much as some people may, I’ve still got to give kudos to these guys for turning another cohesive concept into an album.
6/10
http://theneedledrop.com
http://twitter.com/theneedledrop

I would then use the following regex expression to grab only the scores from this text. The score for this example is 6/10.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df["Description"].str.findall("[0-9]\/10[^a-zA-Z0-9]")

However, the output that I get from using this regex expression is

[6/10\n]

Is there a way to remove these brackets, the \n, and make sure to only grab scores that are in the format of number(0-10)/10 with a regex expression?

>Solution :

Use a lookahead:

df["Description"].str.findall("[0-9]\/10(?=[^a-zA-Z0-9])")

Or a negative lookahead for non-digit:

df["Description"].str.findall("[0-9]\/10(?!\d)")

Output:

0    [6/10]
Name: Description, dtype: object

NB. The brackets are coming from findall. You might also want to use extract/extractall (first match / all matches) in place of findall, in which case use:

df["Description"].str.extract("([0-9]\/10)\D", expand=False)
# or
df["Description"].str.extract("([0-9]\/10)(?!\d)", expand=False)

Or for all matches:

df["Description"].str.extractall("([0-9]\/10)\D")
# or
df["Description"].str.extractall("([0-9]\/10)(?!\d)")

Output (with extract):

0    6/10
Name: Description, dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading