How to split the data in a group of N lines and find intersection character

December 3, 2022

I have a dataset like below:

data="""vJrwpWtwJgWrhcsFMMfFFhFp
jqHRNqRjqzjGDLGLrsFMfFZSrLrFZsSL
PmmdzqPrVvPwwTWBwg
wMqvLMZHhHMvwLHjbvcjnnSBnvTQFn
ttgJtRGJQctTZtZT
CrZsJsPPZsGzwwsLwLmpwMDw"""

These are separate lines. Now, I want to group the data in a set of 3 rows and find the intersecting character in those lines. For example, r is the common character in the first group and Z is the typical character in the second group. So, below is my code:

lines = []
for i in range(len(data.splitlines())):
    lines.append(data[i])
    for j in lines:
        new_line = [k for k in j[i] if k in j[i + 1]]
        print(new_line)

It gives me a string index out-of-range error.

new_line = [k for k in j[i] if k in j[i + 1]]
IndexError: string index out of range

>Solution :

For the record: this was the Advent of Code 2022 Day 3 Part 2 challenge. I kept my data in a file called input.txt and just read line by line, but this solution can be applied to a string too.

I turned converted every line into a set and used the & intersection operator. From there, I converted it to a list and removed the new line character. s[0] is therefore the only repeated character. Like this:

with open('input.txt') as f:
    lines = f.readlines()
    for i in range(0, len(lines), 3):
        s = list(set(lines[i]) & set(lines[i + 1]) & set(lines[i + 2]))
        s.remove('\n')
        print(s[0])

Here’s an example using your data string. In this case, I’d split by the new line character and no longer need to remove it from the list. I’d also extract the element from the set without converting to a list:

data = """vJrwpWtwJgWrhcsFMMfFFhFp
jqHRNqRjqzjGDLGLrsFMfFZSrLrFZsSL
PmmdzqPrVvPwwTWBwg
wMqvLMZHhHMvwLHjbvcjnnSBnvTQFn
ttgJtRGJQctTZtZT
CrZsJsPPZsGzwwsLwLmpwMDw"""


lines = data.split('\n')
for i in range(0, len(lines), 3):
    (ch,) = set(lines[i]) & set(lines[i + 1]) & set(lines[i + 2])
    print(ch)