Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Update key name in a dictionary python

I have the following fasta file in a dictionary, in the following shape:

from Bio import SeqIO

alignment_file = '/Users/dissertation/Desktop/Alignment 4 sequences.fasta'

seq_dict = {rec.id : rec.seq for rec in SeqIO.parse(alignment_file, "fasta")}

Which gives me the following input:

{'NC_000962.3': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'NC_008596.1': Seq('------------------------------------------------------...ccg'),
 'NC_009525.1': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'NC_002945.4': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}

The only issue here is that I would like to replace the key names for other than easier to identify when comparing the sequences to other parts of my code. So I have tried the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

name_list = ['Tuberculosis', 'Smegmatis', 'H37Ra', 'Bovis']

for key in seq_dict:
    for name in name_list:
        seq_dict[name[x]]= seq_dict[key]
    
seq_dict

However I get the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/var/folders/pq/ghtv3wj159j681vy0ny3tz9w0000gp/T/ipykernel_47822/1486954832.py in <module>
      9
---> 10 for key in seq_dict:
     11     for name in name_list:
     12         seq_dict[name[x]]= seq_dict[key]

RuntimeError: dictionary changed size during iteration

I understand that there’s not an easy straight forward way of updating key names values in a dictionary, but I don’t understand the error. Is there a way of doing something similar?

I have also tried this:

seq_dict.update({'NC_000962.3': 'Tuberculosis', 'NC_008596.1': 'Smegmatis', 'NC_009525.1': 'H37Ra', 'NC_002945.4': 'Bovis'})

But this gives me the following output:

{'NC_000962.3': 'Tuberculosis',
 'NC_008596.1': 'Smegmatis',
 'NC_009525.1': 'H37Ra',
 'NC_002945.4': 'Bovis'}

My desire output would look like this:

{'Tuberculosis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'Smegmatis': Seq('------------------------------------------------------...ccg'),
 'H37Ra': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN'),
 'Bovis': Seq('ctgttaccgagatttcttcgtcgtttgttcttggaaagacagcgctggggatcg...NNN')}

Does anybody have an idea on how to update these?

>Solution :

Construct a new dictionary and then assign it to seq_dict in a single operation, rather than mutating seq_dict as you’re in the process of iterating over it. I think this is what you’re aiming for:

seq_dict = dict(zip(name_list, seq_dict.values()))

although I’d personally want to have an explicit mapping from sequence IDs to names rather than relying on the ordering being the same.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading