Dictionary comprehension with a nested list

I have a string of characters and a list of characters. I wish to create a dictionary in which the keys are the characters as and the values are the list, only without the key character.

A string of characters:

sequence = 'ATGCG'

The list:

bases = ['C', 'T', 'A', 'G']

The resulting dictionary would be:

{'A': ['C', 'T', 'G'],
 'T': ['C', 'A', 'G'],
 'G': ['C', 'T', 'A'],
 'C': ['T', 'A', 'G'],
 'G': ['C', 'T', 'A'],
}

I tried using the following code but got a list of 4 items:

variations = {current_base: [base for base in bases if current_base != base]
              for current_base in sequence}

I’d love to get ideas regarding what I’m doing wrong. Thanks.

>Solution :

What you want to do is impossible, a dictionary cannot have duplicated keys.

{'A': ['C', 'T', 'G'],
 'T': ['C', 'A', 'G'],
 'G': ['C', 'T', 'A'],
 'C': ['T', 'A', 'G'],
 'G': ['C', 'T', 'A'], ## this is impossible
}

You can use a list of tuples instead. I am taking the opportunity to show you a more efficient method using python sets:

sequence = 'ATGCG'
bases = set(list('ACGT'))
[(b,list(bases.difference(b))) for b in sequence]

NB. actually, it is even more efficient to pre-compute the diffs as you have a potentially very long DNA sequence, but only 4 bases:

sequence = 'ATGCG'
bases = set(list('ACGT'))
diffs = {b: list(bases.difference(b)) for b in bases}
[(b,diffs[b]) for b in sequence]

output:

[('A', ['T', 'C', 'G']),
 ('T', ['A', 'C', 'G']),
 ('G', ['T', 'A', 'C']),
 ('C', ['T', 'A', 'G']),
 ('G', ['T', 'A', 'C'])]
alternative output using the position as key:
{i: list(bases.difference(b)) for i,b in enumerate(sequence)}

output:

{0: ['T', 'C', 'G'],
 1: ['A', 'C', 'G'],
 2: ['T', 'A', 'C'],
 3: ['T', 'A', 'G'],
 4: ['T', 'A', 'C']}

Leave a Reply