Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split a tuple within a dict and convert into dataframe

I have a dataframe like as shown below

td = {966: [('Feat1', -0.04),
  ('Feat2=True ', -0.02),
  ('Feat3 <= 20000.00', 0.01),
  ('Feat4=Power Supply', -0.01),
  ('Feat5=dada', -0.0)],
 879: [('Feat8=Rare', 0.02),
  ('Feat11=HV', -0.01),
  ('Feat21=Power Supply', -0.01),
  ('20000.00 < Feat3 <= 50000.00', 0.01),
  ('Feat5=dada', -0.01)]}

I would like to do the below

a) Split the tuple within dict based on , comma seperator

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

b) store the numeric part in value column of dataframe and text part in feature column of dataframe

c) repeat the key values for all values in dataframe (and store it in key column)

I tried the below but it is not efficient/elegant and doesn’t scale for big data of million rows

feature=[]
value=[]
key=[]
for k, v in td.items():
    for x in v:
        key.append(k)
        f, v  = x
        feature.append(f)
        value.append(v)
data_tuples = list(zip(key,feature,value))
pd.DataFrame(data_tuples, columns=['key','feature','value'])

I expect my output to be like as shown below

enter image description here

>Solution :

Use generator comprehension with flatten values and pass to DataFrame constructor:

df = pd.DataFrame()(k,b,c) for k, v in td.items() for b, c in v), 
                  columns=['key','feature','value'])
print (df)
   key                       feature  value
0  966                         Feat1  -0.04
1  966                   Feat2=True   -0.02
2  966             Feat3 <= 20000.00   0.01
3  966            Feat4=Power Supply  -0.01
4  966                    Feat5=dada  -0.00
5  879                    Feat8=Rare   0.02
6  879                     Feat11=HV  -0.01
7  879           Feat21=Power Supply  -0.01
8  879  20000.00 < Feat3 <= 50000.00   0.01
9  879                    Feat5=dada  -0.01
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading