My input is a dataframe :
import pandas as pd
import numpy as np
df = pd.DataFrame({'node1': ['_abc-1', 'xyz-1', 'abc-1', '-xyz-2', 'xyz-2', 'abc-2'],
'p1': [1, 10, 3, 1, 2, 6],
'p2': [9, 2, 11, 4, 5, 3],
'node2': ['xyz-1', 'abc-1', '-xyz-1', 'def-2', 'def-2', '-xyz-1']})
# print(df)
# node1 p1 p2 node2
# 0 _abc-1 1 9 xyz-1
# 1 xyz-1 10 2 abc-1
# 2 abc-1 3 11 -xyz-1
# 3 -xyz-2 1 4 def-2
# 4 xyz-2 2 5 def-2
# 5 abc-2 6 3 -xyz-1
I want to keep only the rows that contains abc and xyz in the nodes and then reorganize them to have the same category of nodes in the same column. I want also to be able to say for example : Lets put the xyz nodes in the left part (it means, node1 and p1).
Thanks to you guys in an older post, I was able to build this code but I’m not getting my expected output anymore and I don’t know how to impose an order.
mask1 = (
(
df["node1"].str.contains("abc", case=False) &
df["node2"].str.contains("xyz", case=False)
)
|
(
df["node2"].str.contains("abc", case=False) &
df["node1"].str.contains("xyz", case=False)
)
)
df = df.loc[mask1]
final = df[['node1', 'node2']].apply(np.sort, axis=1).apply(pd.Series)
final.columns = ['node1', 'node2']
mask2 = final.eq(df[['node1', 'node2']]).all(axis=1)
final[['p1', 'p2']] = (
df[['p1', 'p2']].where(mask2, other=df[['p2', 'p1']].values)
)
final = final[df.columns]
print(final)
# node1 p1 p2 node2
# 0 _abc-1 1 9 xyz-1
# 1 abc-1 2 10 xyz-1
# 2 -xyz-1 11 3 abc-1
# 5 -xyz-1 3 6 abc-2
I feel like my code can be shorten much more and fixed at the same time.
Do you guys have any ideas to share ? I’m open to any suggestion.
My expected output is this (if xyz is chosen to be at left) :
# node1 p1 p2 node2
# 0 xyz-1 9 1 _abc-1
# 1 xyz-1 10 2 abc-1
# 2 -xyz-1 11 3 abc-1
# 5 -xyz-1 3 6 abc-2
>Solution :
Following your original logic, with boolean masks and simply inverting the 1/2 in the column names:
m1 = (df["node1"].str.contains("abc", case=False)
&df["node2"].str.contains("xyz", case=False))
m2 = (df["node2"].str.contains("abc", case=False)
&df["node1"].str.contains("xyz", case=False))
inverted_cols = ['node2', 'p2', 'p1', 'node1']
# or, programmatically
# d = {'2': '1', '1': '2'}
# inverted_cols = df.columns.str.replace(r'\d+$',
# lambda m: d.get(m.group(0), m.group(0)),
# regex=True)
out = pd.concat([df[m2],
df[m1].set_axis(inverted_cols, axis=1)]
).sort_index()
Output:
node1 p1 p2 node2
0 xyz-1 9 1 _abc-1
1 xyz-1 10 2 abc-1
2 -xyz-1 11 3 abc-1
5 -xyz-1 3 6 abc-2