Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to split the data in a group of N lines and find intersection character

I have a dataset like below:

data="""vJrwpWtwJgWrhcsFMMfFFhFp
jqHRNqRjqzjGDLGLrsFMfFZSrLrFZsSL
PmmdzqPrVvPwwTWBwg
wMqvLMZHhHMvwLHjbvcjnnSBnvTQFn
ttgJtRGJQctTZtZT
CrZsJsPPZsGzwwsLwLmpwMDw"""

These are separate lines. Now, I want to group the data in a set of 3 rows and find the intersecting character in those lines. For example, r is the common character in the first group and Z is the typical character in the second group. So, below is my code:

lines = []
for i in range(len(data.splitlines())):
    lines.append(data[i])
    for j in lines:
        new_line = [k for k in j[i] if k in j[i + 1]]
        print(new_line)  

It gives me a string index out-of-range error.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

new_line = [k for k in j[i] if k in j[i + 1]]
IndexError: string index out of range

>Solution :

For the record: this was the Advent of Code 2022 Day 3 Part 2 challenge. I kept my data in a file called input.txt and just read line by line, but this solution can be applied to a string too.

I turned converted every line into a set and used the & intersection operator. From there, I converted it to a list and removed the new line character. s[0] is therefore the only repeated character. Like this:

with open('input.txt') as f:
    lines = f.readlines()
    for i in range(0, len(lines), 3):
        s = list(set(lines[i]) & set(lines[i + 1]) & set(lines[i + 2]))
        s.remove('\n')
        print(s[0])

Here’s an example using your data string. In this case, I’d split by the new line character and no longer need to remove it from the list. I’d also extract the element from the set without converting to a list:

data = """vJrwpWtwJgWrhcsFMMfFFhFp
jqHRNqRjqzjGDLGLrsFMfFZSrLrFZsSL
PmmdzqPrVvPwwTWBwg
wMqvLMZHhHMvwLHjbvcjnnSBnvTQFn
ttgJtRGJQctTZtZT
CrZsJsPPZsGzwwsLwLmpwMDw"""


lines = data.split('\n')
for i in range(0, len(lines), 3):
    (ch,) = set(lines[i]) & set(lines[i + 1]) & set(lines[i + 2])
    print(ch)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading