I am currently working on validating data and the following regex pattern is used for a attribute known as ID:
java.util.regex.Pattern.matches("^((?i)(?!.*unknown.*)(?!\\b(misc)\\b)(?!.*tbd.*))[A-Za-z0-9-\\s]{1,}$", input_row.ID_A)
&& java.util.regex.Pattern.matches("^[A-Za-z0-9-\\s]{1,}$", input_row.ID_A)
I understand this as: if a ID attribute contains an unknown, misc, or tbd it will be discarded but if it contains a ID that has characters [A-Za-z0-9-\s] it will be kept?
>Solution :
It will match a string containing letters, numbers, -, and whitspace, unless it begins with the word misc or contains either unknown or tbd anywhere.
(?!.*unknown.*) and (?!.*tbd.*) are negative lookaheads that match those strings anywhere because of the .* around them.
(?!\\b(misc)\\b) is a negative lookahead that matches the misc with word boundaries around it. Since there’s no .* at the beginning, it only applies at that position, which is after ^, which means the beginning of the string.
If any of the negative lookaheads are matched, the regexp match fails.
[A-Za-z0-9-\\s]{1,} matches one of more of the characters that are matched by that character class.