Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Return value from previous row in regex

I am looking to return a specific group in the previous row via regex.

Suppose I have the following information and the target is to extract the value 90 on the basis of the differentiation in the following line.

QTY+66:90:PCE
SCC+2
DTM+45:20200416:15
QTY+66:60:PCE
SCC+3
DTM+35:20210614:2

If I were to traget the value 90, I’d have to look for the SCC+2 tag and if I were to loom for the value 60, it would be the SCC+3 tag.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I got this far in an attempt to return the value 90 (?<=^QTY\+66:)(\d+)(.*\n.*SCC\+2.*) but it seems convoluted and I fail to extract only Group 1. Here is the link to regex101. I am using R for the actual application. Thanks for the help !

>Solution :

You can use

(?<=:)\d+(?=[^\d\r\n]*[\r\n]+.*SCC\+2)

See the regex demo. Details:

  • (?<=:) – a : must occur immediately to the left of the current location
  • \d+ – one or more digits
  • (?=[^\d\r\n]*[\r\n]+.*SCC\+2) – immediately to the right, there must be
  • [^\d\r\n]* – any zero or more chars other than digits, CR and LF
  • [\r\n]+ – one or more CR or LF chars
  • .*SCC\+2 – any text on a line up to the rigthmost occurrence of SCC+2.

In R, you can use

library(stringr)
str_extract(vec, "(?<=:)\\d+(?=[^\\d\r\n]*[\r\n]+.*SCC\\+2)")

And a couple of base R approaches with sub:

sub(".*?\\+\\d+:(\\d+)[^\r\n]*[\r\n]+[^\r\n]*SCC\\+2.*", "\\1", vec)
sub("(?s).*?\\+\\d+:(\\d+)(?-s).*\\R.*SCC\\+2(?s).*", "\\1", vec, perl=TRUE)

See regex 1 demo and regex 2 demo.

See the R demo online:

vec <- "QTY+66:90:PCE\nSCC+2\nDTM+45:20200416:15\nQTY+66:60:PCE\nSCC+3\nDTM+35:20210614:2"
sub(".*?\\+\\d+:(\\d+)[^\r\n]*[\r\n]+[^\r\n]*SCC\\+2.*", "\\1", vec)
sub("(?s).*?\\+\\d+:(\\d+)(?-s).*\\R.*SCC\\+2(?s).*", "\\1", vec, perl=TRUE)
library(stringr)
str_extract(vec, "(?<=:)\\d+(?=[^\\d\r\n]*[\r\n]+.*SCC\\+2)")

All yield [1] "90".

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading