I’m looking to count how many people are below a given user of the data frame.
| Employee | Manager |
|---|---|
| A | – |
| B | A |
| C | A |
| D | A |
| E | A |
| F | B |
| G | B |
| H | C |
| I | C |
I would like to get in the output:
I, H, G, F, E and D have no employees below
C has two employees (H and I) below it
B has two employees (F and G)
A has eight employees below him (B, C, D and E plus the employees of B and C)
Would anyone have any suggestions?
In my DF I have more hierarchy layers and a very large amount of data.
I thought about storing it in a dictionary and doing a loop to update it, but I believe that this solution is not efficient at all. I would like to know if there is any more efficient technique to solve this type of problem.
>Solution :
I would use a directed graph with networkx. This is a super fun python package.
import networkx as nx, pandas as pd
#set up data
employee = ['A', 'B', 'C','D','E','F','G','H','I']
manager = ['', 'A', 'A','A','A','B','B','C','C']
relations = pd.DataFrame(list(zip(employee,manager)), columns = ['Employee', 'Manager'])
# If there is no manager, make it the employee
relations.Manager = np.where(relations.Manager == '', relations.Employee, relations.Manager)
# or might need depending on data format:
relations.Manager = np.where(relations.Manager.isna(), relations.Employee, relations.Manager)
# Create tuples for 'edges'
relations['edge'] = list(zip(relations.Manager, relations.Employee))
# Create graph
G = nx.DiGraph()
G.add_nodes_from(list(relations.Employee))
G.add_edges_from(list(set(relations.edge)))
#Find all the descendants of nodes/employees
relations['employees_below'] = relations.apply(lambda row: nx.descendants(G,row.Employee), axis = 1)
returns:
Employee Manager edge employees_below
0 A A (A, A) {C, G, I, D, H, F, E, B}
1 B A (A, B) {F, G}
2 C A (A, C) {H, I}
3 D A (A, D) {}
4 E A (A, E) {}
5 F B (B, F) {}
6 G B (B, G) {}
7 H C (C, H) {}
8 I C (C, I) {}