Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Java REGEX: include new line and place results in an array

I have a raw text that looks like this:

#John
age: 25
skill: boxer

#Peter
age: 25
skill: fisher

#James
age: 25
skill: bouncer

I intend to separate each block and put each in an array.

My problem is how to get a match using regex that says "get all matching text that start with ‘#’ and ends with ‘#’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My purpose is so that I can fetch John’s block separate from Peter’s block and James’ block.

If I use this:

String    regex = "#(.*)";
List<String> matches = Pattern.compile( regex, Pattern.MULTILINE)   
                    .matcher(raw)
                    .results()
                    .map(MatchResult::group)
                    .collect(Collectors.toList());

The array only contains:

index 0: #John
index 1: #Peter
index 2: #James

which is incomplete because it does not include the ‘age’ and ‘skill’ part of the body.
My desired outcome is this:

index 0: #John
         age: 25
         skill: boxer

index 1: #Peter
         age: 25
         skill: fisher

index 2: #James
         age: 25
         skill: bouncer

Can you please help?

>Solution :

Using a formal regex pattern matcher, we can try the following regex find all approach:

String input = "#John\nage: 25\nskill: boxer\n\n#Peter\nage: 25\nskill: fisher\n\n#James\nage: 25\nskill: bouncer";
List<String> items = new ArrayList<>();
String pattern = "(?s)(#.*?)\\s*(?=#|$)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
int index = 0;
while (m.find()) {
    items.add(m.group(1));
    System.out.println("index " + index++ + ": " + m.group(1));
}

This prints:

index 0: #John
age: 25
skill: boxer
index 1: #Peter
age: 25
skill: fisher
index 2: #James
age: 25
skill: bouncer

The regex patten used says to match:

(?s)             enable dot all mode, so dot matches across newlines
(                capture what follows
#                match a starting #
.*?              then match all content until reaching the nearest
)                end capture
\\s*             optional whitespace
(?=#|$)          followed by either the next # or end of the input
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading