I am using networkx to flatten a parent/child-hierarchy, which works good so far. However, I don’t know how to add additional columns to the result.
This is what I have so far:
import pandas as pd
data = [
['Unit A', 'Department Q', 'Lorem'],
['Unit A', 'Department R', 'Ipsum'],
['Unit A', 'Department S', 'dolor'],
['Department S', 'Office 1', 'sit'],
['Department S', 'Office 2', 'amet'],
['Unit B', 'Department X', 'consetetur'],
['Unit B', 'Department Y', 'sadipscing'],
['Unit B', 'Department Z', 'elitr'],
['Department Z', 'Office 3', 'sed'],
['Department Z', 'Office 4', 'diam'],
['Office 4', 'Place K', 'nonumy'] ,
['Office 4', 'Place L', 'eirmod']
]
df = pd.DataFrame(data, columns=['Parent', 'Child', 'Description'])
then creating the hieararchy and bring it to a df:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Parent', target='Child',
create_using=nx.DiGraph, edge_attr=True,
)
roots = (v for v, d in G.in_degree() if d == 0)
leaves = [v for v, d in G.out_degree() if d == 0]
out = (pd.DataFrame(path for root in roots for path in
nx.all_simple_paths(G, root, leaves))
.add_prefix('Node_')
)
print(out)
Node_0 Node_1 Node_2 Node_3
0 Unit A Department Q None None
1 Unit A Department R None None
2 Unit A Department S Office 1 None
3 Unit A Department S Office 2 None
4 Unit B Department X None None
5 Unit B Department Y None None
6 Unit B Department Z Office 3 None
7 Unit B Department Z Office 4 Place K
8 Unit B Department Z Office 4 Place L
But how to add the additional description column to the output?
>Solution :
Here’s a modification of your script that allows us to meaningfully append those descriptions.
import pandas as pd
import networkx as nx
data = [
['Unit A', 'Department Q', 'Lorem'],
['Unit A', 'Department R', 'Ipsum'],
['Unit A', 'Department S', 'dolor'],
['Department S', 'Office 1', 'sit'],
['Department S', 'Office 2', 'amet'],
['Unit B', 'Department X', 'consetetur'],
['Unit B', 'Department Y', 'sadipscing'],
['Unit B', 'Department Z', 'elitr'],
['Department Z', 'Office 3', 'sed'],
['Department Z', 'Office 4', 'diam'],
['Office 4', 'Place K', 'nonumy'] ,
['Office 4', 'Place L', 'eirmod']
]
df = pd.DataFrame(data, columns=['Parent', 'Child', 'Description'])
G = nx.from_pandas_edgelist(df, source='Parent', target='Child',
create_using=nx.DiGraph, edge_attr=True,
)
roots = (v for v, d in G.in_degree() if d == 0)
out = (pd.DataFrame(path for root in roots for path in
nx.all_simple_paths(G, root, df['Child']))
.add_prefix('Node_')
)
out['Description'] = df['Description']
print(out)
The result:
Node_0 Node_1 Node_2 Node_3 Description
0 Unit A Department Q None None Lorem
1 Unit A Department R None None Ipsum
2 Unit A Department S None None dolor
3 Unit A Department S Office 1 None sit
4 Unit A Department S Office 2 None amet
5 Unit B Department X None None consetetur
6 Unit B Department Y None None sadipscing
7 Unit B Department Z None None elitr
8 Unit B Department Z Office 3 None sed
9 Unit B Department Z Office 4 None diam
10 Unit B Department Z Office 4 Place K nonumy
11 Unit B Department Z Office 4 Place L eirmod