Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to reorganize two pairs of columns and choose the order as well?

My input is a dataframe :

import pandas as pd
import numpy as np

df = pd.DataFrame({'node1': ['_abc-1', 'xyz-1', 'abc-1', '-xyz-2', 'xyz-2', 'abc-2'],
 'p1': [1, 10, 3, 1, 2, 6],
 'p2': [9, 2, 11, 4, 5, 3],
 'node2': ['xyz-1', 'abc-1', '-xyz-1', 'def-2', 'def-2', '-xyz-1']})

# print(df)

#     node1  p1  p2   node2
# 0  _abc-1   1   9   xyz-1
# 1   xyz-1  10   2   abc-1
# 2   abc-1   3  11  -xyz-1
# 3  -xyz-2   1   4   def-2
# 4   xyz-2   2   5   def-2
# 5   abc-2   6   3  -xyz-1

I want to keep only the rows that contains abc and xyz in the nodes and then reorganize them to have the same category of nodes in the same column. I want also to be able to say for example : Lets put the xyz nodes in the left part (it means, node1 and p1).
Thanks to you guys in an older post, I was able to build this code but I’m not getting my expected output anymore and I don’t know how to impose an order.

mask1 = (
    (
        df["node1"].str.contains("abc", case=False) &
        df["node2"].str.contains("xyz", case=False)
    )
    |
    (
        df["node2"].str.contains("abc", case=False) &
        df["node1"].str.contains("xyz", case=False)
    )
)

df = df.loc[mask1]

final = df[['node1', 'node2']].apply(np.sort, axis=1).apply(pd.Series)
final.columns = ['node1', 'node2']

mask2 = final.eq(df[['node1', 'node2']]).all(axis=1)

final[['p1', 'p2']] = (
    df[['p1', 'p2']].where(mask2, other=df[['p2', 'p1']].values)
)

final = final[df.columns]

print(final)
#     node1  p1  p2  node2
# 0  _abc-1   1   9  xyz-1
# 1   abc-1   2  10  xyz-1
# 2  -xyz-1  11   3  abc-1
# 5  -xyz-1   3   6  abc-2

I feel like my code can be shorten much more and fixed at the same time.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Do you guys have any ideas to share ? I’m open to any suggestion.

My expected output is this (if xyz is chosen to be at left) :

#     node1  p1  p2  node2
# 0   xyz-1   9   1  _abc-1
# 1   xyz-1  10   2  abc-1
# 2  -xyz-1  11   3  abc-1
# 5  -xyz-1   3   6  abc-2

>Solution :

Following your original logic, with boolean masks and simply inverting the 1/2 in the column names:

m1 = (df["node1"].str.contains("abc", case=False)
     &df["node2"].str.contains("xyz", case=False))

m2 = (df["node2"].str.contains("abc", case=False)
     &df["node1"].str.contains("xyz", case=False))

inverted_cols = ['node2', 'p2', 'p1', 'node1']
# or, programmatically
# d = {'2': '1', '1': '2'}
# inverted_cols = df.columns.str.replace(r'\d+$',
#                                        lambda m: d.get(m.group(0), m.group(0)),
#                                        regex=True)

out = pd.concat([df[m2],
                 df[m1].set_axis(inverted_cols, axis=1)]
               ).sort_index()

Output:

    node1  p1  p2   node2
0   xyz-1   9   1  _abc-1
1   xyz-1  10   2   abc-1
2  -xyz-1  11   3   abc-1
5  -xyz-1   3   6   abc-2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading