Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Expand list of tuples

I have a big list of tuples (20-30 million entries). Each sub-tuple has three pieces of information (start, end, value) and is supposed to only represent one position in a sequence. However, the program generating this data has concatenated some of the values of adjacent positions if the value is the same. Thus they look a like this:

[..., (71, 72, -0.2998250126838684),
      (72, 73, -0.2858070135116577),
      (73, 74, -0.29049500823020935),
      (74, 75, -0.3044680058956146),
      (75, 76, -0.28386199474334717),
      (76, 80, -0.27730199694633484), # PROBLEM: end - start > 1
      (80, 81, -0.2726449966430664),
      (81, 82, -0.26151400804519653),
      (82, 84, -0.2679719924926758),  # PROBLEM: end - start > 1
      (84, 85, -0.24273300170898438),
      (85, 86, -0.23799900710582733),
      (86, 87, -0.24745100736618042),
      (87, 88, -0.2568419873714447),
      (88, 89, -0.2585819959640503), ...]

To fix this I would like to create new entries in the list which separate these tuples which represent multiple positions into new tuples which represent just one position.
I thus desire an output like this:

(..., (71, 72, -0.2998250126838684),
      (72, 73, -0.2858070135116577),
      (73, 74, -0.29049500823020935),
      (74, 75, -0.3044680058956146),
      (75, 76, -0.28386199474334717),
      (76, 77, -0.27730199694633484), # New
      (77, 78, -0.27730199694633484), # New
      (78, 79, -0.27730199694633484), # New
      (79, 80, -0.27730199694633484), # New
      (80, 81, -0.2726449966430664),
      (81, 82, -0.26151400804519653),
      (82, 83, -0.2679719924926758),  # New
      (83, 84, -0.2679719924926758),  # New
      (84, 85, -0.24273300170898438),
      (85, 86, -0.23799900710582733),
      (86, 87, -0.24745100736618042),
      (87, 88, -0.2568419873714447),
      (88, 89, -0.2585819959640503), ...)

To do this, with my list of tuples called bwList I have done the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

replacementTuple = ()

for t in bwList:
    if t[1] - t[0] == 1:
        replacementTuple = replacementTuple + (t,)
    else:
        numNewTuples = t[1] - t[0]
        st, ed = range(t[0], t[1]), range(t[0] + 1, t[1] + 1)
        for m in range(numNewTuples):
            replacementTuple = replacementTuple + ((st[m], ed[m], t[2]) ,) 

Here the output is a tuple of tuples, not a list. I don’t mind too much either way.

This approach appears to work, but is very slow!

Is there a way to speed this up?

>Solution :

Dramatically simpler (and likely faster, mostly by virtue of amortized O(1) appends) would be to build a list that expands each tuple (possibly to the same tuple just once, but without special-casing):

replacement_list = [(newstart, newstart+1, value)
                    for start, end, value in bwList
                    for newstart in range(start, end)]

You can convert that to a tuple at the end if you like, but building it as a list first will save a lot of trouble.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading