Home Efficient way to transform a dictionary into a dataframe in pandas

Questions

Efficient way to transform a dictionary into a dataframe in pandas

July 15, 2022

I have a dictionary such as :

  mydict=  {'scaffold1': SeqRecord(seq=Seq('AGAGGTAGAGGCAGAAAACATAGTGAGCACGCTGTGTTTAAT'), id='scaffold1', name='scaffold1', description='scaffold1 0.0', dbxrefs=[]), 'scaffold2': SeqRecord(seq=Seq('GCAAAAGCAAAGCCAGATCAGAGTCCAGACAGTGAAGGCAAGACTAGTAAAGT'), id='scaffold2', name='scaffold2', description='scaffold2 0.0', dbxrefs=[])}

I wondered if someone knew an efficient way to process this dictionary and create a dataframe from it by adding three columns:

Scaffolds column which is the keys of the dictionary
The Seq_length which is the length of the Seq string
The GC% which is the number of G and C letters within Seq divided by the Seq_length (for example len(Seq) of scaffold1 is 42, and there are 18 G and C letters (so GC% = 18/42)

I should then get :

Scaffolds Seq_length GC%
scaffold1 42         0.428 
scaffold2 53         0.453

I’m looking for an efficient way to do this task as my real dict is really huge (1,046,544 keys)

Thanks a lot for your help

>Solution :

You can rework the dictionary:

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

mydict = {'scaffold1': SeqRecord(seq=Seq('AGAGGTAGAGGCAGAAAACATAGTGAGCACGCTGTGTTTAAT'), id='scaffold1', name='scaffold1', description='scaffold1 0.0', dbxrefs=[]), 'scaffold2': SeqRecord(seq=Seq('GCAAAAGCAAAGCCAGATCAGAGTCCAGACAGTGAAGGCAAGACTAGTAAAGT'), id='scaffold2', name='scaffold2', description='scaffold2 0.0', dbxrefs=[])}

from Bio.SeqUtils import GC

df = pd.DataFrame([{'Scaffolds': k,
                    'Seq_length': len(s.seq),
                    'GC%': GC(s.seq)}
                   for k, s in mydict.items()])

output:

   Scaffolds  Seq_length        GC%
0  scaffold1          42  42.857143
1  scaffold2          53  45.283019

pandas

byMR

Published July 15, 2022

Add a comment

I am getting error sending status code in laravel apide

byMR

July 15, 2022

Questions

Python – Printing file name show's full path instead of just the file name

byMR

July 15, 2022

Questions

Capture all characters in single string between regex matches

byMR

July 15, 2022

Questions

how to hide an element based on the height of another element

byMR

July 15, 2022

Questions

Express router for accepting a URL as route parameter

byMR

July 15, 2022

Questions

Rounding up to two limits in Python

byMR

July 15, 2022

Efficient way to transform a dictionary into a dataframe in pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

I am getting error sending status code in laravel apide

Python – Printing file name show's full path instead of just the file name

Capture all characters in single string between regex matches

how to hide an element based on the height of another element

Express router for accepting a URL as route parameter

Rounding up to two limits in Python

Keep Up to Date with the Most Important News

Efficient way to transform a dictionary into a dataframe in pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

I am getting error sending status code in laravel apide

Python – Printing file name show's full path instead of just the file name

Capture all characters in single string between regex matches

how to hide an element based on the height of another element

Express router for accepting a URL as route parameter

Rounding up to two limits in Python

Discover more from Dev solutions