Background:
I was working on a project were I needed to write some rules for text-processing. After working on this project for a couple of days and implementing some rules, I realized I needed to determine the order of the rules. No problem; we have topological sorting to help. But then I realized that I can't expect the graph to always be full. So I came up with this idea, that given a single rule with a set of dependencies (or a single dependency) I need to check the dependencies of the dependencies. Sounds familiar? Yes. This subject is very similar to depth-first-search of a graph. I am not a mathematician, nor did I study C.S. Hence, Graph Theory is a new field for me. Nevertheless, I implemented something below which works (inefficiently, I suspect).
The code:
This is my search and yield algorithm. If you run it on the examples below, you will see it visits some nodes more than once. Hence, the speculated inefficiency. A word about the input. The rules I wrote are basically Python classes, which have a class property depends
. I was criticized for not using inspect.getmro
- But this would complicate thing terribly because the class would need to inherit from each other (See example here).
def _yield_name_dep(rules_deps): global recursion_counter recursion_counter = recursion_counter +1 # yield all rules by their named and dependencies for rule, dep in rules_deps.items(): if not dep: yield rule, dep continue else: yield rule, dep for ii in dep: i = getattr(rules, ii) instance = i() if instance.depends: new_dep={str(instance): instance.depends} for dep in _yield_name_dep(new_dep): yield dep else: yield str(instance), instance.depends
Input to test:
demo_class_content =""" class A(object): depends = 'B' def __str__(self): return self.__class__.__name__ class B(object): depends = ('C','F') def __str__(self): return self.__class__.__name__ class C(object): depends = ('D', 'E') def __str__(self): return self.__class__.__name__ class D(object): depends = None def __str__(self): return self.__class__.__name__ class F(object): depends = 'E' def __str__(self): return self.__class__.__name__ class E(object): depends = None def __str__(self): return self.__class__.__name__ """ with open('demo_classes.py', 'w') as clsdemo: clsdemo.write(demo_class_content) import demo_classes as rules rule_start={'A': 'B'} def _yield_name_dep(rules_deps): # yield all rules by their named and dependencies for rule, dep in rules_deps.items(): if not dep: yield rule, dep continue else: yield rule, dep for ii in dep: i = getattr(rules, ii) instance = i() if instance.depends: new_dep={str(instance): instance.depends} for dep in _yield_name_dep(new_dep): yield dep else: yield str(instance), instance.depends if __name__ == '__main__': # this is yielding nodes visited multiple times, # list(_yield_name_dep(rule_start)) # hence, my work around was to use set() ... rule_dependencies = list(set(_yield_name_dep(rule_start))) print rule_dependencies
The questions:
- I tried classifying my work, and I think what I did is similar to DFS. Can you really classify it like this?
- How can I improve this function to skip visited nodes, and still use generators?
answers ...
Using the feedback from Gareth and other kind users of Stackoverflow, here is what I came up with. It is clearer, and also more general:
def _dfs(start_nodes, rules, visited): """ Depth First Search start_nodes - Dictionary of Rule with dependencies (as Tuples): start_nodes = {'A': ('B','C')} rules - Dictionary of Rules with dependencies (as Tuples): e.g. rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 'D':(), 'E':(), 'F':()} The above rules describe the following DAG: A / \ B C / \ / \ D E F usage: >>> rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 'D':(), 'E':(), 'F':()} >>> visited = [] >>> list(_dfs({'A': ('B','C')}, rules, visited)) [('A', ('B', 'C')), ('B', ('D', 'E')), ('D', ()), ('E', ()), ('C', ('E', 'F')), ('F', ())] """ for rule, dep in start_nodes.items(): if rule not in visited: yield rule, dep visited.append(rule) for ii in dep: new_dep={ ii : rules[ii]} for dep in _dfs(new_dep, rules, visited): if dep not in visited: yield dep