Home sort dataframe index by the second position from nested dictionaries

Questions

sort dataframe index by the second position from nested dictionaries

January 30, 2022

I have this code where lig_dec_residue is this nested dictionary which comes from big .dat files:

 lig_dec_residue = {'f1': {}, 'f2': {}, 'f3': {}}


def plot_lig(res):
    df = pd.DataFrame.from_dict(lig_dec_residue)
    df.index = df.index.str.split(' ')
    df.index = df.index.str[0] + ' ' + (df.index.str[1].astype(int) + int(res) - 1).astype(str)
    df = df[df <= -0.25]
    df.dropna(how='all', inplace=True)
    df.plot(kind='bar', edgecolor='black')
    plt.legend(['X var', 'Y var', 'Z var'])
    plt.show()
    plt.close()

and this is the result:

                   f1        f2        f3
ARG 403 -0.265999       NaN -0.390653
LEU 455 -1.948253 -2.125521 -1.988445
PHE 456 -1.974429 -1.835651 -2.177540
ALA 475 -0.796856 -1.032929 -0.968554
GLY 476 -0.262736 -0.744952 -0.257448
ASN 477       NaN       NaN -0.868419
PHE 486 -3.674621 -2.882512 -3.179725
ASN 487 -1.172256 -0.805725 -1.050299
LYS 493 -2.283489       NaN -5.231593
SER 496       NaN       NaN -0.366986
PHE 497       NaN -0.340862       NaN
ARG 498 -1.485091       NaN -1.140743
THR 500 -1.497597 -0.778616 -1.961580
TYR 501 -4.286950       NaN -4.851700
GLY 502 -0.447453 -0.808606 -0.702321
VAL 503 -0.256496 -0.371461 -0.977062
HIS 505 -1.420959       NaN -1.321259
LYS 417       NaN -1.115154       NaN
GLN 493       NaN -2.625195       NaN
GLY 496       NaN -1.232041       NaN
GLN 498       NaN -2.271338       NaN
ASN 501       NaN -4.152646       NaN
TYR 505       NaN -2.469813       NaN

Pandas plots the last six entries apart from the rest (look at TYR 501, ASN 501: they should be close but they are not!).

The idea is to make a comparison between f1 f2 and f3 with a bar plot.
This is my output:

Is there a way to sort the index properly? I think this output might be due to the lexicographic sorting method.
I know that there’s natsort library, but I can’t make use of if since the dataframes comes from nested dictionaries.
I would like to group the bars based on the number of the index (eg, HIS 505 next to TYR 505) for a direct comparison where applicable.

Thank you!

Ludovico

>Solution :

Use sort_index with a custom key:

df = df.sort_index(key=lambda x: x.str.split().str[1].str.zfill(5))
print(df)

# Output
               f1        f2        f3
ARG 403 -0.265999       NaN -0.390653
LYS 417       NaN -1.115154       NaN
LEU 455 -1.948253 -2.125521 -1.988445
PHE 456 -1.974429 -1.835651 -2.177540
ALA 475 -0.796856 -1.032929 -0.968554
GLY 476 -0.262736 -0.744952 -0.257448
ASN 477       NaN       NaN -0.868419
PHE 486 -3.674621 -2.882512 -3.179725
ASN 487 -1.172256 -0.805725 -1.050299
LYS 493 -2.283489       NaN -5.231593
GLN 493       NaN -2.625195       NaN
SER 496       NaN       NaN -0.366986
GLY 496       NaN -1.232041       NaN
PHE 497       NaN -0.340862       NaN
GLN 498       NaN -2.271338       NaN
ARG 498 -1.485091       NaN -1.140743
THR 500 -1.497597 -0.778616 -1.961580
TYR 501 -4.286950       NaN -4.851700
ASN 501       NaN -4.152646       NaN
GLY 502 -0.447453 -0.808606 -0.702321
VAL 503 -0.256496 -0.371461 -0.977062
HIS 505 -1.420959       NaN -1.321259
TYR 505       NaN -2.469813       NaN

Detail about the key:

>>> df.index.str.split().str[1].str.zfill(5)
Index(['00403', '00455', '00456', '00475', '00476', '00477', '00486', '00487',
       '00493', '00496', '00497', '00498', '00500', '00501', '00502', '00503',
       '00505', '00417', '00493', '00496', '00498', '00501', '00505'],
      dtype='object')

Note: padding with 0 allow you to have a natural sorting when two numbers have not the same length:

>>> '23' > '5'
False

>>> '23' > '05'
True

nested

byMR

Published January 30, 2022

Add a comment

shows variable uninitialized when calling a function that uses malloc

byMR

January 30, 2022

Questions

How to remove side scroll?

byMR

January 30, 2022

Questions

How to fine tune the redundant code in python

byMR

January 31, 2022

Questions

Why is the text value duplicating when clicking on the add button JavaScript

byMR

January 31, 2022

Questions

Legend not changing colors in ggplot2

byMR

January 31, 2022

Questions

Pointer being freed was not allocated – destructor

byMR

January 31, 2022

sort dataframe index by the second position from nested dictionaries

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

shows variable uninitialized when calling a function that uses malloc

How to remove side scroll?

How to fine tune the redundant code in python

Why is the text value duplicating when clicking on the add button JavaScript

Legend not changing colors in ggplot2

Pointer being freed was not allocated – destructor

Keep Up to Date with the Most Important News

sort dataframe index by the second position from nested dictionaries

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

shows variable uninitialized when calling a function that uses malloc

How to remove side scroll?

How to fine tune the redundant code in python

Why is the text value duplicating when clicking on the add button JavaScript

Legend not changing colors in ggplot2

Pointer being freed was not allocated – destructor

Discover more from Dev solutions