Flatten hierarchy with networkx and append columns

March 30, 2023

I am using networkx to flatten a parent/child-hierarchy, which works good so far. However, I don’t know how to add additional columns to the result.
This is what I have so far:

import pandas as pd

data = [
    ['Unit A', 'Department Q', 'Lorem'],
    ['Unit A', 'Department R', 'Ipsum'],
    ['Unit A', 'Department S', 'dolor'],
        ['Department S', 'Office 1', 'sit'],
        ['Department S', 'Office 2', 'amet'],
    ['Unit B', 'Department X', 'consetetur'],
    ['Unit B', 'Department Y', 'sadipscing'],
    ['Unit B', 'Department Z', 'elitr'],
        ['Department Z', 'Office 3', 'sed'],
        ['Department Z', 'Office 4', 'diam'],
            ['Office 4', 'Place K', 'nonumy'] ,
            ['Office 4', 'Place L', 'eirmod']   
]
  
df = pd.DataFrame(data, columns=['Parent', 'Child', 'Description'])

then creating the hieararchy and bring it to a df:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='Parent', target='Child',
                            create_using=nx.DiGraph, edge_attr=True,
                           )

roots = (v for v, d in G.in_degree() if d == 0)
leaves = [v for v, d in G.out_degree() if d == 0]

out = (pd.DataFrame(path for root in roots for path in
                    nx.all_simple_paths(G, root, leaves))
        .add_prefix('Node_')
      )

print(out)

   Node_0        Node_1    Node_2   Node_3
0  Unit A  Department Q      None     None
1  Unit A  Department R      None     None
2  Unit A  Department S  Office 1     None
3  Unit A  Department S  Office 2     None
4  Unit B  Department X      None     None
5  Unit B  Department Y      None     None
6  Unit B  Department Z  Office 3     None
7  Unit B  Department Z  Office 4  Place K
8  Unit B  Department Z  Office 4  Place L

But how to add the additional description column to the output?

>Solution :

Here’s a modification of your script that allows us to meaningfully append those descriptions.

import pandas as pd
import networkx as nx

data = [
    ['Unit A', 'Department Q', 'Lorem'],
    ['Unit A', 'Department R', 'Ipsum'],
    ['Unit A', 'Department S', 'dolor'],
        ['Department S', 'Office 1', 'sit'],
        ['Department S', 'Office 2', 'amet'],
    ['Unit B', 'Department X', 'consetetur'],
    ['Unit B', 'Department Y', 'sadipscing'],
    ['Unit B', 'Department Z', 'elitr'],
        ['Department Z', 'Office 3', 'sed'],
        ['Department Z', 'Office 4', 'diam'],
            ['Office 4', 'Place K', 'nonumy'] ,
            ['Office 4', 'Place L', 'eirmod']   
]
  
df = pd.DataFrame(data, columns=['Parent', 'Child', 'Description'])

G = nx.from_pandas_edgelist(df, source='Parent', target='Child',
                            create_using=nx.DiGraph, edge_attr=True,
                           )

roots = (v for v, d in G.in_degree() if d == 0)
out = (pd.DataFrame(path for root in roots for path in
                    nx.all_simple_paths(G, root, df['Child']))
        .add_prefix('Node_')
      )
out['Description'] = df['Description']

print(out)

The result:

    Node_0        Node_1    Node_2   Node_3 Description
0   Unit A  Department Q      None     None       Lorem
1   Unit A  Department R      None     None       Ipsum
2   Unit A  Department S      None     None       dolor
3   Unit A  Department S  Office 1     None         sit
4   Unit A  Department S  Office 2     None        amet
5   Unit B  Department X      None     None  consetetur
6   Unit B  Department Y      None     None  sadipscing
7   Unit B  Department Z      None     None       elitr
8   Unit B  Department Z  Office 3     None         sed
9   Unit B  Department Z  Office 4     None        diam
10  Unit B  Department Z  Office 4  Place K      nonumy
11  Unit B  Department Z  Office 4  Place L      eirmod