I’m trying to write an algorithm in Python to get a list of all nodes in a tree where the path reaches until the last node.
Each child has an unknown number of children. It is a representation of a comment thread. For instance the for post A there are replies B and C and so on. I want to get all unique streams of conversation so just get the body of the text from each node: eg: (A B D G). So the data frame would have the following rows: 0(A B D G), 1(A B D H), 2(A C E I), 3(A C F), 4(A C J).
A
|
-----
| |
B C
| |
| -------
| | |. |
D E F. J
/ | |
G H I
This the tree I currently have
class TreeNode:
def __init__(self, data):
self.data = data
self.children = []
self.parent = None
def add_child (self, child):
child.parent = self
self.children.append(child)
def get_children_body(self):
body=""
for child in self.children:
body=body+child.data["body"]
return body
def has_children(self):
return len(self.children)>0
def get_author(self):
return self.data["author"]
Edit: this does not work but this is something I have tried
#returns list of list
def find_all_branches(node):
retList=[]
for child in node.children:
retList.append(find_all_branches_recursive(child,[node.data],retList))
return retList
def find_all_branches_recursive(node,curr_list,final_list):
if node.has_children():
for child in node.children:
curr_list.append(child.data)
find_all_branches_recursive(child,curr_list,final_list)
else:
curr_list.append(node.data)
final_list.append(curr_list)
>Solution :
Here’s a simple enough generator function:
def paths(node):
if not node.children:
yield [node.data]
return
for child in node.children:
for path in paths(child):
yield [node.data] + path
>>> list(paths(a)) # a being the root node of your tree
[['A', 'B', 'D', 'G'],
['A', 'B', 'D', 'H'],
['A', 'C', 'E', 'I'],
['A', 'C', 'F'],
['A', 'C', 'J']]