I am trying to compare elements between two lists. One list is predefined which is a pattern with which new lists has to be compared. The comparison should be done between elements of same index between lists.
Example: list1[0] has to be only compared with list2[0], list1[1] has to be only compared with list2[1] etc. The output should only return as True if all the elements match.
The issue I am facing is, one element in predefined pattern has a part which will be dynamic, when comparing I have to ignore. How can I achieve this
pattern = ['Hi', 'my' , 'name is <xxxxxxxxxxx> age <yy>']
This is defined pattern. Here the contents inside <> is dynamic and has to be ignored.
when comparing list2 = ['Hi', 'my' , 'name is soku age 21'] should be true.
list3 = ['Hi', 'my', 'soku'] should be false
How can I achieve this because normal element to element string comparison wont work.
Another example
pattern = ['A', 'B', 'C_<xxxx>_AB']
list1 = ['A', 'B', 'C_aaaa112=22_AB']
This should be true
>Solution :
One approach is to use all and re.fullmatch:
import re
pattern = ['Hi', 'my', 'name is .+ age \d{2}']
list2 = ['Hi', 'my', 'name is soku age 21']
list3 = ['Hi', 'my', 'soku']
print(all(re.fullmatch(p, l) for p, l in zip(pattern, list2)))
print(all(re.fullmatch(p, l) for p, l in zip(pattern, list3)))
Output
True
False
As an alternative you could use the following pattern:
pattern = ['Hi', 'my', 'name is \S+ age \d{2}']
to avoid matching whitespaces characters.
The pattern:
.+
matches any character including whitespace, while
\S+
matches any character which is not a whitespace character. Moreover the pattern:
\d{2}
will match two contiguous digits.
To build the pattern dynamically from user input, you could do something like below:
pattern = ['Hi', 'my', 'name is <xxxxxxxxxxx> age <yy>']
regex_pattern = [re.sub(r"<.+?>", r".+", s) for s in pattern]
print(all(re.fullmatch(p, l) for p, l in zip(regex_pattern, list2)))
print(all(re.fullmatch(p, l) for p, l in zip(regex_pattern, list3)))
Output
True
False